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Abstract From as early as the 1930s, astronomers have tried to quantify the statistical na- 
ture of the evolution and large-scale structure of galaxies by studying their luminosity distri- 
bution as a function of redshift - known as the galaxy luminosity function (LF). Accurately 
fT^ ' constructing the LF remains a popular and yet tricky pursuit in modern observational cos- 

mology where the presence of observational selection effects due to e.g. detection thresholds 
in apparent magnitude, colour, surface brightness or some combination thereof can render 
any given galaxy survey incomplete and thus introduce bias into the LF. 
u: Over the last 70 years there have been numerous sophisticated statistical approaches 

(— I I devised to tackle these issues; all have advantages - but not one is perfect. This review 

^ |. takes a broad historical look at the key statistical tools that have been developed over this 

' ' period, discussing their relative merits and highlighting any significant extensions and mod- 

^ , ifications. In addition, the more generalised methods that have emerged within the last few 

' years are examined. These methods propose a more rigorous statistical framework within 

^ I which to determine the LF compared to some of the more traditional methods. 1 also look 

at how photometric redshift estimations are being incorporated into the LF methodology as 
well as considering the construction of bivariate LFs. Finally, 1 review the ongoing devel- 
^ . opment of completeness estimators which test some of the fundamental assumptions going 

' into LF estimators and can be powerful probes of any residual systematic effects inherent 

. magnitude-redshift data. 

o 

■ Keywords Galaxies: luminosity function, mass function - methods: statistical - cosmol- 

I ogy: large-scale structure of the Universe 

o 

1 Introduction 

. , — I , Understanding the origins and growth of structure that form the galaxies we observe today 

' is one of the many driving forces behind current cosmological research. The luminosity 

^ I function (LF), denoted by (in units of ergs s^^ Mpc^'^), provides one of the most 
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fundamental tools to probe the distribution of galaxies over cosmological time. It describes 
the relative number of galaxies of different luminosities by counting them in a representative 
volume of the Universe which then measures the comoving number density of galaxies per 
unit luminosity, L, such that 

dN = ${L)dLdV. (1) 

where dN is the observed number of galaxies within a luminosity range [L, L + dL\. When 
working in luminosities it is common practice to apply log intervals of L. The quantity ${L) 
can be normalised such that 

oo 

j ${L)dL = p, (2) 



where p is the number of objects per unit volume V, and thus <P{L)dL gives the number 
density of objects within a given luminosity range. In general, the density function can be 
defined by p(x), where x represents the 3D spatial cartesian co-ordinates such that the total 
number A'^ of objects per unit volume (Mpc^'^) is 

N = I p(x)dx (3) 
Jv 

However, it is common place to compute p from the measured redshifts z and angular co- 
ordinates. Thus, for a given sample within a respective minimum and maximum redshift 
range z^^in and Zmax and solid angle J? at a distance r it is possible to compute, 

N= J P{^)^dz, (4) 

where p{z) is now the density as a function of redshift and dV' = flr^dr is a solid angle- 
integrated differential volume element (see e.g. Choloniewski 1987). 

The LF provides us with a robust handle to compare the difference between different sets 
of galaxies i.e. at different redshifts, galaxy types, environment etc... It allows us to assess 
the statistical nature of galaxy formation and evolution and indeed, it seems that as soon as a 
new survey is carried out one of the first actions is to compute the LF, see e.g. Blanton et al. 
(2001); Fried etal. (2001); Norberg et al. (2002); Im et al. (2002); Blanton et al. (2003b); 
Liske et al. (2003); Wolf et al. (2003); Croom et al. (2004); Richards et al. (2006); Ilbert et al. 
(2006); Faber et al. (2007); Bouwens et al. (2008a); Siana et al. (2008); Crawford et al. (2009); 
Zucca et al. (2009); Haberzettl et al. (2009); Montero-Dorta & Prada (2009); Croom et al. 
(2009b); Willott et al. (2010); Rodighiero et al. (2010); Eales et al. (2010); Tilvi et al. (2010); 
Hill et al. (2010) to highlight just a small fraction of studies over the last ten years. This is 
perhaps indicative of the continuing popularity and relevance of this area of research. 

The study of galaxy LFs is now a vast subject area spanning a broad range of wave- 
lengths and probing back to the earliest galaxies. Therefore, it should be noted that this 
review is led with a strong emphasis on areas of work that have driven the development of 
the statistical methodology of LF estimation. Consequently, there will be areas of research 
that have not been cited, and so apologies are given in advance. Instead, the most relevant 
extensions and variations of the traditional approaches are examined in detail as well as con- 
sidering the most recent generalised statistical advances for LF estimation. Furthermore, I 
explore the branch of astro-statistics concerning completeness estimators that, at their core, 
represent a test of the validity of the assumption of separability between the LF, (f>{M) and 
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the density function p{z), that is inherent in the LF methodology. I also discuss how such 
estimators have been used to constrain evolutionary models. 

The format of this article will be as follows. The remainder of this section discusses 
some of the parameterisations of the LF that have led to the maximum likelihood estima- 
tors. There is then a brief introduction to the traditional non-parametric methods before 
moving to § 2 where the historical driving forces within astronomy and cosmology are con- 
sidered both from an observational and theoretical/numerical point of view. In § 3 some 
of the critical underlying assumptions inherent to most LF estimators are discussed before 
beginning the review in § 4 with the maximum likelihood parametric approach. The focus 
is then shifted toward the myriad of non-parametric methods in § 5 before looking at how 
both the MLE and non-parametric methods have been used to construct bivariate luminosity 
functions (BLF) in § 6, as well as incorporating photometric redshift estimation into current 
LF methodology in § 7. In § 8 the results of papers which have compared the traditional es- 
timators are reviewed. I then discuss in detail some of the emerging methods that have been 
developed in recent years in § 9. § 10 explores the tests of independence and completeness 
estimators that probe the separability assumption which underpins most LF estimators. This 
then leads to a discussion in § 1 1 on some astrophysical applications for which accurate LF 
estimation has been crucially important. § 12 closes the review with final a summary and 
discussion. 



1 . 1 Parameterising the luminosity function 

The processes by which one can estimate the LF vary greatly. A popular parametric method 
developed (Sandage, Tammann, & Yahil 1979) is based on the Maximum Likelihood Esti- 
mator (MLE) where a parametric form of the LF is assumed. The most common of these 
models is that of the Press-Schechter function named after William Press and Paul Schechter 
(Press & Schechter 1974; Schechter 1976) who originally derived it in the form of the mass 
function during their studies of structure formation and evolution. It is typically written in 
the form given by. 



where, 4>* is a normalisation factor defining the overall density of galaxies, usually quoted 
in units of h^Mpc^^, and L* is the characteristic luminosity. The quantity a defines the 
faint-end slope of the LF and is typically negative, implying large numbers of galaxies with 
faint luminosities. LF studies of the local Universe (z < 0.2) have estimated close to a ~ 
— 1.0 which could imply that there may be an infinite number of faint galaxies. However, 
integrating Equation 5 over luminosity provides us with the total luminosity density, ji^ (in 
solar luminosity units, h L© Mpc^^) and ensures the total luminosity density remains finite. 




(5) 




■OO 



JL = 



L$[L) dL = 0,L,r(2 + a,L/Z„), 



(6) 



where r{a, h) is the incomplete gamma function defined as 




(7) 
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See e.g. Norberg et al. (2002) and Blanton et al. (2001) for estimates of jj^ from the respec- 
tive Two Degree Field Galaxy Redshift Survey (2dFGRS) and the Sloan Digital Sky Sur- 
vey (SDSS) redshift surveys respectively. A recent paper by Hill et al. (2010) combines the 
survey data from the Millennium Galaxy Catalogue (MGC Liske et al. 2003; Driver et al. 
2005), SDSS (York et al. 2000; Adelman-McCarthy et al. 2007) and the UKIRT Infrared 
Deep Sky Survey Large Area Survey (UKJDSS LAS Lawrence et al. 2007; Warren et al. 
2007) to determine LPs and luminosity densities over a broad range of wavelengths (UV to 
NIR) and thus probe the cosmic spectral energy distribution at z < 0.1. 

By assuming the mass-to-light ratio relation, it is also possible to use to compute the 
total contribution from stars and galaxies to the mean mass density of the Universe p (e.g. 
Loveday et al. 1992). 



At this point it should be noted that it is common practice to convert Equation 5 from abso- 
lute luminosities L to absolute magnitudes M via the simple relation 

This leads to the equivalent expression of the Schechter LF, 

Aq0.4(J\/*-M)\ 

m) = O.41n(lO)0* ^ • (9) 

In a similar way for <P{L) the LF in terms of magnitudes Al is usually plotted in log space i.e 
log[$(Af)] vs. M. For a galaxy with an observed apparent magnitude m, it is then straight- 
forward to determine its absolute magnitude M by 

M = m - 51ogio(dL) - 25 - Ag{l,b) - K{z) - E{z), (10) 

provided that the corrections for galactic extinction, Ag{l, b), if-correction (see § 3.1), K{z), 
and evolution (see § 3.2), E{z), are well understood. The quantity d]^ is the luminosity 
distance, which, by invoking the standard ACDM cosmology, is defined as, 

where Qm and J7/i represent the present-day dimensionless matter density and cosmological 
energy density constant respectively, z is the redshift of the object, c is the speed of light and 
Ho is the Hubble constant. 

Although the Schechter form of the LF has been very successful as a generic fit to a 
wide variety of survey data, it has also been shown that galaxy surveys sampled in the infra- 
red have yielded LFs that do not seem to fit the standard Press-Schechter formalism. For 
example, in Lawrence et al. (1986) the following power law analytical form was fitted to 
data obtained from the Infrared Astronomical Satellite (IRAS), 

where pi and define the slopes of the two power laws. In Saunders et al. (1990) a log- 
Gaussian form was adopted for a survey also using the IRAS given by 



d'P . ( L^^ ^ 



0(L) = - = ^.(-) exp 



1,2/-, L 



(13) 
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where 7 is analogous to a in the Schechter formahsm. Sanders et al. (2003) instead fitted a 
broken power-law of the form 

0(L)ocL", (14) 

to IRAS 60 fim bright galaxy sample. This form of the LF has more recently been adopted 
by Magnelli et al. (2009, 2011) when probing the infrared LF out to z = 2.3 using Spitzer 
observations. 

A study of Early-Type galaxies within the Sloan Digital Sky Survey (SDSS) by Bemardi et al. 
(2003a) found that a Gaussian function of the following form best described the data: 

For the study of quasi-stellar objects (QSO) Boyle et al. (1988a,b) demonstrated that a two- 
power law parameterisation is a reasonable fit to the data, 

^{^^) — ]^Q0.4[M-A/*](a+l) _|_ 2^0O-4[M-A/*](^-|-l) ' ^^^^ 

with a break at M* and a bright-end slope a steeper than the faint-end slope /3. 



1.2 Toward more robust estimates of the luminosity function 

Complementary to the parametric approach are the non-parametric methods which do not 
require any underlying assumption of the parametric form of the LF. The traditional ap- 
proaches for this class of estimator are the classical number count test (e.g Hubble 1936b; 
Christensen 1975), the Schmidt (1968) 1/Vinax estimator, the method (e.g. Turner 
1979), and the Lynden-Bell (1971) C~ method. There is also the non-parametric coun- 
terpart of the MLE developed by Efstathiou, Ellis, & Peterson (1988) and often referred to 
as the Step-wise Maximum Likelihood method (SWML). A summary of all the traditional 
methods and their extensions is show in Table 1 at the end of § 7. 

More recently, this area of work has seen a renaissance where more generalised tech- 
niques have emerged in an attempt to provide a more rigorous statistical footing. One such 
approach by Schafer (2007) developed a semi-parametric approach that differs crucially 
from the above methods by not explicitly assuming separability between the luminosity 
function, (^(M), and the density function, p{z) (see § 3.3.3 for further discussion of separa- 
bility). Alternatively, work by Andreon (2006) and Kelly et al. (2008) developed approaches 
rooted within a Bayesian framework to estimate the LF. For the bivariate LF (BLF), Takeuchi 
(2010) applies the copula ' to construct the far-ultraviolet (FUV) - far-infrared (FIR) BLF. 
A non-parametric method by Le Borgne et al. (2009) has been developed and applied to 
multi-wavelength high redshift IR and sub-mm data. The method differs from other non- 
parametric estimators by applying an inversion technique to galaxy counts that does not 
require the use of redshift information. With no assumption on the parametric form of the 
LF, the method uses an assumed set of spectral energy distribution (SED) templates to ex- 
plore the range of possible LFs for the observed number counts. 

All these diverse approaches perhaps underline the difficulty in its nature where intrin- 
sic bias can often lead to conflicting results. As a result it is quite often the case that several 

' The copula is a function used to join multivariate distribution functions to their one-dimensional marginal 
distribution function and is particularly useful for variables with co-dependence. 
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methods are applied to a given sample to help achieve consistency and quantify any bias 
found. For example, Ilbert et al. (2005) developed their "Algorithm for Luminosity Func- 
tion" (ALF) tool that implements the l/Knax, C"*" (a modified version of Lynden-Bell's 
method), SWML and the STY estimators applied to the VIMOS-VLT DEEP survey data 
(see also Zucca et al. 2006; Ilbert et al. 2006). More recently, these estimators were applied 
together to the zCOSMOS survey (Zucca et al. 2009). 

There have been several reviews of the LF methodology as it has developed over the 
years. The earliest of these was by Felten (1977) who performed nine determinations of 
the LF using variations of the classical method. BST88 gave a comprehensive review of 
all non-parametric and parametric methods that had been developed up to 1988. However, 
key papers by Heyl et al. (1997), Willmer (1997), Takeuchi, Yoshikawa, & Ishii (2000) and 
Ilbert et al. (2004) have made direct comparisons of the traditional methods with the use of 
real and simulated survey data. For the first time a rigorous exploration into their relative 
merits was made by applying the major LF estimators to different scenarios from e.g. homo- 
geneous samples to ones with strong clustering properties, density evolution, varying LFs, 
observational bias from Jf-corrections etc. (see § 8). For a nice introduction to LF estimators 
with numerous examples, the reader is also directed to Chapters 6 and 7 of Wall & Jenkins 
(2003). 



2 Motivation 

We do not yet have a complete theory that fully describes how galaxies form and evolve into 
the structures we observe today, and any model of formation and evolution must match the 
observed data. Since the fledgling redshifts surveys during the 1970s, the mapping of the sky 
has turned into a thriving industry where the largest survey to date, the Sloan Digital Sky 
Survey, has imaged in excess of 270 million galaxies with approximately a million of these 
having spectroscopic redshifts. As as a result, luminosity function studies have been a vital 
tool with which to analyse these surveys, since they allow us to probe the evolutionary pro- 
cesses of extragalactic sources as a function of redshift and thus place powerful constraints 
on current galaxy formation and evolutionary models. 

At around the same time redshift surveys came to fruition, the theoretical framework 
was being laid for how we currently model galaxy formation and evolution. Moreover, as 
computational technology came of age, the field of numerical cosmology saw rapid devel- 
opment providing state-of-the-art N-body simulations based on our current knowledge of 
these physical processes and observations. 

2. 1 Constructing a model of galaxy evolution 

The current standard Concordance model of cosmology represents the most concise model 
to date, combining astronomical observations with theoretical predictions to explain the ori- 
gins, evolution, structure and dynamics of the Universe. The origin of the Concordance 
model is rooted in the Copernican principle - a fundamental assumption proposed by Nico- 
laus Copernicus in the 16th century that states we do not occupy a privileged position in the 
Universe. This concept was generalised into what is termed, the Cosmological Principle, in 
which we assume that the Universe is both homogeneous and isotropic. In 1922 Alexander 
Friedman provided a formal description in terms of a metric that was later independently 
improved upon by Howard Robertson, Arthur Walker and Georges Lemaitre into what is 
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now referred to as the Friedmann-Lemaitre-Robertson-Walker metric, or more simply - the 
FLRW metric. 

In its current, simplest form, the standard model is known as A Cold Dark Matter 
(ylCDM). Whilst dark matter particles have not yet been directly detected, there are a number 
of independent observations derived from high velocity dispersions of galaxies observed in 
clusters (Zwicky 1933) and of flat galaxy rotation curves (Rubin et al. 1980; Sofue & Rubin 
2001), which support its infen^ed existence. The A term refers to the non-zero cosmological 
constant in general relativity theory which implies the Universe is currently undergoing a 
period of cosmic acceleration (see e.g. Peebles & Ratra 1988; Steinhardt et al. 1999). The 
recent observational techniques derived from distance measurements of Type la supernovae 
by Riess et al. (1998) and Perlmutter et al. (1999) have provided evidence of this accelera- 
tion; later studies from baryonic acoustic oscillations (Eisenstein et al. 2005), the integrated 
Sachs-Wolf (ISW) effect (Giannantonio et al. 2008) and weak lensing (Schrabback et al. 
2010) have helped strengthen support of the ylCDM model. 

By the 1960s a theory of galaxy formation was proposed by Eggen et al. (1962) known 
as monolithic collapse (or the 'top-down' scenario) in which galaxies originate from large 
regions of primordial baryonic gas. This baryonic mass then collapses to form stars within 
the central region allowing the most massive galaxies to form first. Seminal work by James 
Peebles (see e.g. Peebles 1970; Gunn & Gott 1972; Peebles 1973, 1974, 1980) estaWished 
the theoretical underpinnings for structure formation known as the gravitational instabil- 
ity paradigm. In essence, this theory states that small scale density fluctuations in the early 
Universe grew by gravitational instability into the structures we see today. This led to the es- 
tablishment of the Hierarchical clustering model (or 'bottom-up' scenario) in which larger 
objects evolve from mergers with smaller objects (see e.g. Searle & Zinn 1978). The bottom- 
up scenario is now the more favoured of the two models and is fundamental to galaxy forma- 
tion models, ultimately leading to the theoretical and numerical modelling applied in current 
N-body simulations. 

The work by Press & Schechter (1974) demonstrated how the structure of halos in the 
early Universe could form through gravitational condensation in a density field using a 
Gaussian random field of gas-like particles. Thus, the first analytical treatment of the mass 
function was derived. White & Rees (1978) developed these ideas formulating a more so- 
phisticated model of galaxy formation in which dark matter halos formed hierarchically and 
consequently baryons cooled and condensed at their centres to form galaxies. Thus, small 
proto-galaxies form early in the history of the Universe and through merging processes build 
up into larger galaxies we see today. 

The production of early N-body codes of the 1960s and 1970s (e.g. Aarseth 1963; 
Gingold & Monaghan 1977; Lucy 1977) began to see rapid development during the 1980s. 
Aarseth et al. (1979) was one of the first to develop simulations of galaxy clustering. How- 
ever, as the Cold Dark Matter model (Blumenthal et al. 1984) garnered momentum, Davis et al. 
(1985) provided the first simulations within the hierarchical clustering CDM framework. 
Over the next two decades hydrodynamics were integrated into simulations that would in- 
clude processes such as star formation, feedback and gas cooling (see also e.g. Fall & Efstathiou 
1980; White et al. 1987; Schaeffer & Silk 1988; White & Frenk 1991; Katz & Gunn 1991; 
Navarro & Benz 1991; Cen & Ostriker 1992; Katz & White 1993; Cole et al. 1994; Lacey & Cole 
1994; Navarro et al. 1994; Springel et al. 2001 ; Springel 2005; Keres et al. 2005; Bower et al. 
2006; De Lucia & Blaizot 2007; Bertone et al. 2007; Weinberg et al. 2008; Dave et al. 2008; 
Keres et al. 2009, highlighting just a few). 

However, despite the tremendous achievements made in the development of cosmologi- 
cal simulations, there remains a number of discrepancies between astronomical observations 
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and what has been simulated within a ylCDM framework. This has led some to question the 
validity of the current concordance model. Such discrepancies include the lack of direct 
detection of dark matter; the missing satellite problem, where observations of the local Uni- 
verse have found orders of magnitude less dwarf galaxies than predicted by simulations (e.g. 
Klypin et al. 1999; Moore et al. 1999; Willman et al. 2004; Kravtsov 2010); the core-cusp 
problem, in which observations indicate a constant dark matter density in the inner parts 
of galaxies (de Blok 2010), whilst simulations prefer a power-law cusp as originally devel- 
oped by Navarro et al. (2004); and finally, the angular momentum problem, in which ylCDM 
simulations have had difficulties reproducing disk-dominated and bulgeless galaxies. 

Whilst the existence of dark matter remains an illusive quantity, possible solutions to 
addressing the remaining issues may lie with not only in improving simulations with higher 
resolutions and more rigorous treatment of physical mechanism within galaxies (e.g. su- 
pernovae feedback), but also obtaining higher resolution observations from next genera- 
tion telescopes (see e.g. Gnedin & Zhao 2002; Governato et al. 2004; Robertson et al. 2004; 
Simon & Geha 2007; Governato et al. 2007; Simon & Geha 2007; McConnachie et al. 2009, 
for recent developments). Of course there exists the possibility that maybe as these improve- 
ments are made we may require an entirely different model beyond ylCDM. 



2.2 The rise and rise of redshift surveys 
2.2.7 Measuring redshift s 

The difference between redshifts obtained spectroscopically or photometrically essentially 
reduces to the precision of the methods. Spectroscopy is the most precise and, consequently, 
the most popular way to measure redshifts. The technique requires identification of spectral 
lines (typically emission lines) and record their wavelength A. The relative shift of a these 
lines compared to their known position measured in the laboratory (Ao) allows us to measure 
the redshift. 

The acquisition of redshift data in this way can be described as a two-stage process. 
One firstly images galaxies in a region of space using low resolution photometry and then 
targets these galaxies to perform high resolution spectroscopy requiring long integration 
times to achieve sufficiently high signal-to-noise. Redshift surveys such as the Two Degree 
Field Galaxy Redshift Survey (2dFGRS) and the Sloan Digital Sky Survey (SDSS) used 
multi-fibre spectrographs which could, respectively, record up to 400 and 600 redshifts si- 
multaneously. 

Alternatively, the use of photometry, offers a much quicker way to estimate redshifts to a 
much greater depth and has thus gained in popularity in recent times. However, the precision 
to which they are currently measured remains poor, with a typical uncertainty in the range 
0.05 5z < 0.1. Moreover, effects from e.g. absorption due to galactic extinction and the 
Lyman-alpha forest can contribute to systematics (see e.g. Massarotti et al. 2001). The tech- 
niques of photometric estimation of redshifts can be traced back to Baum (1962) who first 
developed a method using multi-band photometry in 9 filters for elliptical galaxies. Further 
work by Loh & Spillar (1986) and Connolly et al. (1995) saw this area develop into two 
respective techniques: the template fitting methods and empirical fitting methods. Template 
fitting requires a library of theoretical or empirical SEDs to be generated coupled with spec- 
troscopic redshifts (for calibration purposes) which are then fitted to the observed colours of 
the galaxies, where redshift is a fitted parameter (see e.g. Bolzonella et al. 2000). Empirical 
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fitting methods, whilst similar, instead use empirical relations with a training set of galax- 
ies with spectroscopically obtained redshifts (see also e.g. Brunner et al. 1997; Wang et al. 
1998). In the case of Connolly et al. (1995), for example, they used linear regression where 
the redshift was assumed to be a linear or quadratic function of the magnitudes. 

Both methods have been improved upon by the incorporation of, for example, Bayesian 
inference (Kodama et al. 1999; Bem'tez 2000; Stabenau et al. 2008; Wolf 2009), nearest 
neighbour weighting schemes (e.g. Ball et al. 2008; Lima et al. 2008), boosted decision trees 
(Gerdes et al. 2010), and artificial neural networks (Firth et al. 2003; Vanzella et al. 2004; 
CoUister et al. 2007; Oyaizu et al. 2008; Yeche et al. 2010). Alternatively, others have at- 
tempted more generalised approaches that incorporate aspects of both the traditional meth- 
ods (e.g. Budavari 2009). 

Improvements on photometry have also assisted the case for photo-z. For example, the 
COMBO-17 photometric redshift survey produced multi-colour data in a total of 17 optical 
filters - five broad-band filters (UBVRI) and 12 medium-band covering a wavelength range 
of 400 to 930 nm (see e.g. Wolf et al. 2004). Having this many filters allowed significant 
increases in resolution improving galaxy redshift accuracy to 5z ~ 0.03 (Wolf et al. 2003; 
Bonfield et al. 2010). 

2.2.2 Redshift surveys 

Galaxy redshift surveys have played, and will continue to play a vital role in our under- 
standing of the formation, evolution and distribution of galaxies in the Universe. Prior to 
the 1970's, models of the structure of the Universe were based on the observed distribution 
of galaxies projected onto the plane of the sky. Whilst early pioneers had already identified 
the clustering nature of galaxies from 2-D samples (e.g. Hubble 1936b; Charlier, C. V. L. 
1922), it would take the move to three-dimensional data-sets before the wider astronomy 
community would accept these claims. As a consequence, this required the measurement 
of redshifts on a much grander scale. To make a large enough survey where redshifts of 
thousands of galaxies could be measured would require a lot of dedicated telescope time 
and funding. Nevertheless, it was in 1977 that these investments were made and dedicated 
redshift surveys began. 

The first major breakthroughs in mapping large-scale structure began with the CfA sur- 
vey which ran from 1977 to 1982 (Huchra et al. 1983) and measured spectroscopic redshifts 
for a total of 2401 galaxies out to a limiting apparent magnitude of miim < 14.5 mag. This 
survey represented the first large area maps of large-scale structure in the nearby universe 
and confirmed the 3-D clustering properties of galaxies already proposed a little over 50 
years previously. CfA2 (Geller &. Huchra 1989) continued the survey measuring a total of 
18,000 redshifts out to 15,000 kms^^ and miim < 15.5 mag (see Figure 1). With survey data 
provided by the Southern Sky Redshift Survey (SSRS/SSRS2) da Costa et al. (1998, 1994), 
CfA was extended to include the southern hemisphere. Despite this tremendous achieve- 
ment, spectrographic technological constraints allowed only one galaxy at a time to be ob- 
served, making the whole process extremely time consuming. However, the technology de- 
velopments during the 1980s provided the first multi-object fibre spectrographs that allowed 
between 20 and 200 galaxies to be observed simultaneously during one exposure. 

A new era of space-based telescopes began with the Infrared Astronomical Satellite 
(IRAS), launched 1983. The IRAS Point Source Catalogue (PSCz) redshift survey ran from 
1992 to 1995 and mapping 15,41 1 galaxies over 84% of the sky out to 0.6 Jy in the far-IR (60 
/^m) (Saunders et al. 2000) marking the first all-sky survey (see Figure 2 left). The Hubble 
Space Telescope (HST) was launched in 1990 and the Hubble Deep Field-North (HDF-N) 
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Fig. 1 Extract strips from the CfA redshift surveys. The strip on the sky was 6 degrees wide and 130 
degrees long with our origin begin at the apex of the wedge. Image courtesy of the Smithsonian Astrophysical 
Observatory Emilio Falco et al. (1999). 



survey in 1995 (Williams et al. 1996) was the next landmark, allowing unprecedented detail 
of faint galaxy populations to a magnitude of my = 30 mag out to high redshift. In 1998 the 
follow-up survey, HDF-South, sampled a random field in the southern hemisphere sky with 
equal success (see e.g. Femandez-Soto et al. 1999). The more recent Hubble Ultra Deep 
Field (HUDF Beckwith et al. 2006) ran from 2003 to 2004 and has allowed evolutionary 
LF constraints on the very faintest galaxies to redshifts reaching the end of the epoch of 
re-ionisation, ^p/ioto ~ 6 (e.g. Ferguson et al. 2000; Kneib et al. 2004; Bouwens et al. 2009; 
Zheng et al. 2009; Finkelstein et al. 2010; McLure et al. 2010; Oesch et al. 2010). 
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Fig. 2 Left - 3D representation of the of the PSCz survey (Image courtesy of Dr. Luis Teodoro). Middle - 
The 2dF galaxy redshift survey (2dFGRS) final data release showing one of the two wedges (Image courtesy 
of http://msowww.anu.edu.au/2dFGRS/). Right - The Sloan Digital Sky Survey (SDSS) (Image courtesy of 
http://www.sdss.org). 



Throughout the 1990s there were a number of surveys that paved the way to con- 
strain the local LF out to 2 ~ 0.2 such as the Stromlo-APM Redshift Survey (S-APM) 
(Loveday et al. 1992), Las Campanas (LCRS) (Lin et al. 1996) and the ESO Slice Project 
(ESP) (Zucca et al. 1997). However, the next milestone from ground-based telescopes was 
in the form of the Two Degree Field Galaxy Redshift Survey (2dFGRS) (CoUess 1998). 
This survey ran from 1998 to 2003 and used the multi-fibre spectrograph on the Anglo- 
Australian Telescope (AAT), which could measure up to 400 galaxy redshifts simultane- 
ously. The photometry was taken from the Automatic Plate Measuring (APM) scans of the 
UK Schmidt Telescope (UKST) plates with measured magnitudes out to mj,™ = 19.45 mag. 
The 2dFGRS team recovered a total of 245,591 redshifts, 220,000 of which were galaxies 
(see Figure 2 middle). With this increase in instrumentation precision and the vast number of 
objects catalogued, the scientific goals became equally ambitious. Some of 2dFGRS goals 
included measuring the power spectrum of the galaxy distribution on scales up to few hun- 
dred Mpc^^, determining the galaxy LF, clustering amplitude and the mean star-formation 
rate out to a redshift z ~ 0.5. 

At around the same time as the 2dFGRS another team was carrying out a survey called 
the Two Micron All Sky Survey (2MASS) (Skrutskie et al. 2006). This saw the return of a 
near-infrared full sky survey and was the first all-sky photometric survey of galaxies brighter 
than rax = 13-5 mag cataloguing approximately 100,000 galaxies. 

However, it is the Sloan Digital Sky Survey (SDSS) that takes the prize as the most ambi- 
tious survey to date by mapping a quarter of the entire sky to a median spectroscopic redshift 
Zspec ~ 0.2 using multi-band photometry with unprecedented accuracy (e.g. Abazajian et al. 
2003; Adelman-McCarthy et al. 2006; Abazajian et al. 2009). Using the dedicated, 2.5-meter 
telescope on Apache Point, New Mexico, USA and multi-fibre spectrographs, SDSS has im- 
aged over 200 million galaxies and obtained just over 1 million spectroscopic redshifts. The 
combination of 2dFGRS and SDSS has provided the most accurate maps of the nearby Uni- 
verse placing strong constraints on the LF out to z ~ 0.2 (Norberg et al. 2002; Blanton et al. 
2003b; Montero-Dorta & Prada 2009). 

Exploring galaxy evolution out to 2 ~ 1.0 and beyond has been and continues to be ex- 
plored with surveys such as the Canada-France Redshift Survey (CFRS) (Lilly et al. 1995), 
Autofib I & II (EUisetal. 1996; Heyl et al. 1997), the Canadian Network for Observa- 
tional Cosmology survey (CNOCl & 2) (Lin et al. 1997, 1999), COMBO-17 (Wolfetal. 
2003),VIMOS-VLT Deep Survey (VVDS) (e.g. Ilbert et al. 2005), the Deep Extragalactic 
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Evolutionary Probe 2 (DEEP2) (e.g. Willmer et al. 2006) and the more recent zCOSMOS 
(Lilly 2007; Zucca et al. 2009). All have been instrumental in constraining the statistical na- 
ture of the evolutionary processes of early-type to late-type galaxies at intermediate redshifts 
through the study of LF as a function of color (e.g. Bell et al. 2004). 

There appears to be no sign of a slowing down of redshift surveys. In fact, future projects 
such as the Dark Energy Survey and those involving the Large Synoptic Survey Telescope 
(LSST), the Square Kilometre Array demonstrator telescopes, ASKAP and MeerKAT, and 
the James Web Space Telescope (JWST) will provide the next generation of surveys with 
data-sets orders of magnitude larger, with greater wavelength coverage, and probing both 
fainter and more distant galaxies. 



3 K-corrections, Evolution and Completeness 

This section begins with a brief overview of the /^-correction and the methods adopted 
to constrain source evolution for a given population of galaxies. The final part discusses 
the two important fundamental assumptions common to all the traditional LF estimators, 
namely, completeness of observed data and separability between the luminosity and density 
functions. 



3.1 The if-correction 

The modelling of galaxy evolution and A'-correction is a vital part of most galaxy survey 
analysis, which, if not accounted for properly, can adversely affect accurate determination 
of the LF. The use of A'-correction can be traced backed to early 20th century pioneering 
observers such as Hubble (1936a) and Humason et al. (1956). The observed wavelength 
from a galaxy is different from the one that was emitted due to cosmological redshift, z. 
The iC-correction allows us to transform from the observed wavelength, Ao when measured 
through a particular filter (or bandpass) at z, into the emitted wavelength, Ae in the rest frame 
at 2 = 0. Following the derivation from Hogg et al. (2002) (see also e.g. Oke & Sandage 
1968) we consider an object with an apparent magnitude that has been observed in the 
R bandpass. However, it is desirable to obtain the absolute magnitude Mq in the rest-frame 
bandpass Q. Therefore, we firstly consider the relation between the corresponding emitted 
frequency, Ve and the observed frequency, vo, given by 

Ve = {l + z)vo, (17) 

where z is redshift at which the source object was observed. Hogg et al. then go onto show 
that the corresponding A'-correction can be determined by 



K = -2.5 log 



10 



(18) 

where 5* is often referred to as the transmission (or sensitivity) function of the detector for 
which at each frequency u is the mean contribution of a photon of frequency v to the output 
signal from the detector in the respective bandpass. The quantity g is the spectral density of 
flux for a zero-magnitude or standard source. Thus determining the K-corrected absolute 
magnitude in the rest frame of the object is simply given by 

MQ = mii-5\og{dL)~K, (19) 

where is the cosmology dependent luminosity distance already defined in Equation 11. 
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(a) Universal Schechler LF (b) PLE Model: (c) PDE Model: 

No evolution E(z) = -|33.5[log,„(H-z)] 0.(z) = 0;(z)(l+z)'' 
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Absolute Magnitude (M) 



Fig. 3 The characteristic shape of the Schechter luminosity function. The LF is typically presented in terms 
of absolute magnitudes AI with the number density on the y-axis as the logj^Q (# ). For this example the LF 
has been modelled using constrained LF parameter values based on the 2dFGRS results. In panel (a) this 
implies: a = —1.21, M, = —19.61 and 4>* = 1-61 X Mpc^^. The steepness of the faint-end 

slope is determined by the a parameter and the characteristic magnitude, A/» , indicates the 'knee' of the LF. 
Panel (b) shows the effect of introducing a pure luminosity evolution (PLE) model. In terms of magnitudes 
this model is shown as E{z) and is dependent on the evolutionary parameter /3. The red Hne shows the LF 
from panel (a) and the dotted lines (from left to right on the plot) represents the same LF for a range of f} = 
1.5, 1.0, 0.5, -0.5, -1.0 and -1.5, at a redshift of z = 0.2. In panel (c) a pure density evolution (PDE) model 
is alternatively introduced in to the LF. As with panel (b), the dotted lines (from top to bottom on the plot) 
represent LFs for the range of 7 = 6.0, 4.0, 2.0 and -2.0, -4.0 and -6.0 at the same redshift. 



3.2 Evolution 

From a purely practical point of view, a frustrating aspect of studying galaxy evolution 
resides in our inability to trace the individual evolutionary processes of a galaxy through 
cosmological time. Since we do not have a concise and complete theory of galaxy formation 
and evolution, we can instead examine the statistical properties of populations of galaxies 
at different epochs (see e.g. Rowan-Robinson et al. 1993; Marzke et al. 1994a; Lilly et al. 
1995; Willmer et al. 2006; Heyl et al. 1997; Faber et al. 2007, for detailed studies). In this 
section two different approaches regularly applied to constrain evolution are outlined. 

The first method adopts a strict parametric form based on some physical assumptions 
regarding our current understanding of luminosity evolution and/or number (mergers) evolu- 
tion (see e.g. Wall et al. 1980; Phillipps & Driver 1995; Heyl et al. 1997; Driver et al. 2005; 
Walletal. 2008; Prescott et al. 2009; Croom et al. 2009b; Rodighiero et al. 2010). In the 
Pure Luminosity Evolution (PLE) scenario it is assumed that massive galaxies were assem- 
bled and formed most of their stars at high redshift and have evolved without merging i.e. not 
convolved number evolution. The correction applied to account for this form of evolution is 
redshift and galaxy-type dependent and typically takes the form 

L*{z) = L40){l + zf (20) 

where /3 is the evolution parameter and although it is galaxy-type dependent, a global cor- 
rection is often adopted. Applying this correction to magnitudes, the evolution correction 
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E{z) is the written as 

= M^{0) - E{z) (21) 
£(2) = -/J2.51ogio(l + 2) (22) 

For the study of quasi-stellar objects (QSOs) in the 2df QSO redshift survey, Croom et al. 
(2004) instead, characterised luminosity evolution as a second-order polynomial luminosity 
given by 

U{z) = 140)10''^''+''^''' (23) 
M*(2) = M*(0)-2.5(fciz + fc2z2) (24) 

where the model requires a two parameter fit given by fci and fc2. 

Pure Density (Number) Evolution (PDE) assumes that galaxies were more numerous in 
the past but have since merged. The treatment of this type of evolution can be modelled in a 
similar way as PLE, 

M^) = <j,*o{z){i + zy, (25) 

where the parameter, 7, is the number evolution parameter. By assuming either of the above 
models (or a combination of both) one can apply this technique to observables and simul- 
taneously constrain both the LF and E{z) parameters via a maximum likelihood technique. 
Whilst this has proved to be a popular approach it should be noted that the wrong evolution- 
ary model may be adopted in the analysis yielding questionable results. 

The second alternative approach is termed as a free-form technique that, as the name sug- 
gests, requires no strict parametric assumption for the evolutionary model. An early method 
by Robertson (1978, 1980) developed an iterative procedure for exploring evolution of the 
radio luminosity function (RLE) whereby E{z) was sampled over multiple log(z) slices un- 
til convergence. Whilst this approach allowed a free-form analysis in redshift, assumptions 
regarding luminosity evolution were still required and there was no built-in measure of the 
range of evolution allowed by the data (see e.g. Zawislak-Raczka & Kumor-Obryk 1986, 
1987, 1990, for applications and extensions). Peacock & Gull (1981) and Peacock (1985) 
developed a fully free-form methodology that instead assumes a series expansion for the 
evolutionary model and employs a minimisation technique to constrain its coefficients. 
A more recent development in this area by Dye & Eales (2010) extends the ideas of Peacock 
and Gull by introducing a reconstruction technique that discretises the functions of redshift 
and luminosity and employs an adaptive gridding in (L — z) space to constrain evolution. 

Recent observational techniques (e.g. Blanton et al. 2003a; Bell et al. 2004) have iden- 
tified a bimodal distribution in color across a broad range of redshifts allowing clear segre- 
gation of early- (red) and late- (blue star forming) type galaxies. This has allowed deeper 
exploration into the evolutionary properties of these populations beyond the simple e.g PLE 
approach (see § 11.1.2 for further discussion). 

Figure 3 shows a simple example of a typical Schechter LF modelled on results from the 
2dFGRS Norberg et al. (2002) paper. In panel (a) $(M) is plotted in log space. The input 
parameters are, a = -1.21, Ah = -19.61 and = 1.61 x 10~^/i Mpc~^. The panel 
indicates the various components that make up the shape of the LF i.e. the power-law slope at 
the faint end the steepness of which is governed by q; Ah, which characterises the turnover 
from the bright-end to the faint end; and the exponential cut-off which constrains the bright 
end of the LF. Panel (b) then shows how pure luminosity evolution (PLE) affects the shape 
of the LF by incorporating the PLE model from Equation 22 into the Schechter function for 
a range of /? given by [1.5, 1.0, 0.5, -0.5, -1.0 , -1.5] at a fixed redshift of z = 0.2. These 
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are indicated, respectively, from left to right on the plot by the dotted lines. The red solid 
line shows the Universal LF from panel (a). As one would expect, the LF is affected only in 
its position in magnitude and is independent of any shift in number density. Panel (c) now 
shows the case where only pure density evolution (PDE) is present by adopting Equation 25 
for a range of 7 given by [6.0, 4.0, 2.0 and -2.0, -4.0, -6.0] as, respectively, indicated by 
the dotted lines from top to bottom in the plot. It is clear that the shape of the LF is now 
dominated by shifts in the direction. 

3.3 Completeness, separability and the selection function 

In general, when compiling a magnitude-redshift catalogue, we would like to be able to 
quantify in some way, how close we are to having a representative sample of the underlying 
distribution of galaxies. However, there are a number of constraints preventing us from ob- 
serving all objects in the sky. This is termed a measure of completeness for a given survey. 
The ability to interpret and measure it accurately is not trivial. There are many diverse con- 
tributing sources of m-completeness that have to be corrected for and understood to construct 
accurately the LF. Nevertheless, an assumption of LF estimation is one of completeness of 
a volume- or flux-limited survey. For clarity, I now discuss two general interpretations of 
completeness that commonly appear in the observational cosmology literature: 

3.3.1 Redshift completeness 

This can be described by the process of combining photometric and spectroscopic galaxy 
survey catalogues. In the most general sense, an observer will measure apparent magnitudes, 
m, of galaxies in a portion of the sky out to a faint limiting apparent magnitude, m\^-^ 
imposed by the physical limitations of the telescope. CCD instrumentation can be affected 
by pixel saturation due to very bright objects. This imposes a bright limit to the survey, m\^. 
At this stage the catalogue consists of measured magnitudes and sky positions only, without 
their 3D spatial redshift distribution. Therefore, each galaxy is then targeted to obtain a 
measure of its redshift. The most accurate approach is by multi-fibre spectroscopy, where 
optical fibres are positioned on a plate which has holes drilled at the positions of the sources 
measured from the photometry. However, a drawback of measuring redshifts in this way 
arises from spatial limitations. For example, in a region that has a high density of objects, a 
lot of galaxies may be missed since there is a physical spatial limit on how close the fibres 
can be placed i.e. fibre collisions. Therefore, redshift completeness can be presented as the 
percentage of successfully measured redshifts over a list of targeted galaxies within a survey. 
Whilst this is in itself an important contributing factor to the overall completeness picture, 
we can describe the overarching completeness in terms of magnitude completeness. 

3.3.2 Magnitude completeness 

At this stage we can now ask the question, how do we know our percentage of successfully 
matched targets is complete up to (or within) the apparent magnitude limit(s), ml^^-^ (and 
mj^jjj)? Some of the contributing factors that affect this specific type of completeness can 
be summarised as: galaxies that are missed because they are located close to bright stars or 
lie close to the edge or defected part of the CCD image; the wrong A'-corrections applied; 
the wrong evolutionary model adopted; galaxies with the same surface brightness that may 
or may not be detected depending on their shape and overall extent i.e. a compact object is 
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more likely to have enough pixels above the detection limit than a very diffuse galaxy of the 
same brightness; and adverse effects from a varying magnitude limit over a photographic 
plate or CCD image. 

Cosmological surface brightness dimming of objects due to the expansion of the uni- 
verse has been examined by e.g. Lanzetta et al. (2002). By studying the distribution of star- 
formation rates as a function of redshift in the Hubble Deep Field they demonstrated that 
such an effect would bias derived quantities such as the luminosity density. 

An effect explored by Ilbert et al. (2004) that can potentially bias the shape of the global 
LF arises from a wide range of JiT-corrections being applied across different galaxy types. 
Such an effect results in a varying galaxy-type dependent absolute magnitude limit where 
certain galaxy populations will not be detectable out to the full extent of the magnitude limit 
of the survey. This form of incompleteness is particularly crucial toward the faint end of the 
LF and discussed in greater detail in § 8. 

A more fundamental form of incompleteness arises from the observational limitations of 
a telescope and is often referred to as Malmquist Bias (see e.g. Hendry & Simmons 1990). 
As one images out to higher redshifts only intrinsically bright sources will be observed. 
Since bright objects at large distances are rare, one observes a decrease in the number density 
of imaged objects as a function of redshift. A way to quantify this effect is to compute the 
selection function, the probability that a galaxy at a redshift z will be included in a given 
magnitude- (or flux) limited survey. Thus, in the simplest scenario, the selection function 
S{z) may be expressed as 

S{z) = (26) 

I-oo ^^^) '^^ 

where Mn^n is the absolute magnitude limit of the survey. In practical terms this obser- 
vational effect implies we are sampling less of the underlying distribution of galaxies at 
increasing redshifts. To correct for this effect it is common to apply a weighting scheme by 
incorporating the inverse of the selection function in the LF estimation such that 

™ °^ (27) 
S{z) 

More generally, both magnitude and redshift completeness definitions can now be grouped 
in terms of, the probability that a galaxy of apparent magnitude, m, is observable. 



3.3.3 Separability 



Weighting galaxies via the selection function introduces a very large assumption that is 
central to all the traditional methods for constructing the LF - separability between the 
probability densities 4>{M) and p{z). This directly implies that the absolute magnitudes M 
(or luminosities L) are statistically independent from their spatial distribution and thus the 
LF has a Universal form. As such, the joint probability density of M and z, P{M, z), can be 
expressed in a separable form as the product of the two univariate distributions such that 

P(M,z) (xcf>{M)p(z) (28) 

The assumption of separability between the probability densities, can thus be thought of 
as an extension to the idea of magnitude completeness. In this scenario it is assumed that 
the absolute magnitudes, M, have also been corrected, where deemed necessary, for any 
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luminosity and/or density evolution (Equations 20 and 25, respectively), galactic extinction 
and if-correction. In terms of the LF, these corrections can also be thought of as additional 
contributing factors to completeness. Therefore, only when such corrections have been ac- 
curately made can the separability assumption be valid. 

4 The Parametric Maximum Likelihood Estimator 

This review begins with the maximum likelihood estimator (MLE). As a statistical tool, the 
MLE is by no means a recent development. Its origins can be traced as far back as e.g. 
Bernoulli (1769, 1778) through to R. A. Fisher who provided a more formal derivation of 
the MLE in his seminal papers Fisher (1912, 1922). For a comprehensive historical review 
see e.g. Aldrich (1997) and Stigler (2008). In terms of its application within the context of 
observational cosmology it was Sandage, Tammann, & Yahil (1979), hereafter STY79, who 
were the first to see it as powerful approach to galaxy LF estimation. This is a parametric 
technique which therefore assumes an analytical form for the LF and thus eliminates the 
need for the binning of data (as usually required by most non-parametric methods). The 
equally popular non-parametric counterpart of the MLE, the step-wise maximum likelihood 
(SWML), is discussed in § 5.5. 

The derivation of the MLE begins by considering x, a continuous random variable, that 
is described by a probability distribution function (PDF) given by 

f{x;e), (29) 

where 6 represent the parameter we wish to estimate. As we shall see, in practice 9 represents 
more than one parameter of the LF. If x represents our observed data then the likelihood 
function, C, can be written as 

N 

f{xi,X2, XN\ei,e2, Sfc) = £ = J] fix,; 61,62, ^fc) (30) 

where Xi are A'' independent observations. It is often the case that the likelihood function is 
expressed in terms of the logarithmic likelihood such that 

N 

ln{£} = ^ln/(a;,; 01,^2,. ...Sfc) (31) 

1=1 

Constraining the 61,62, ■■■,6^. parameters is, in principle, a straightforward matter of max- 
imising the likelihood function C{6) or ln{£(6')} such that 

"^^°;;;^^» =o, ,=1,2,...,^ 02) 

In the context of estimating the parameters of the LF we consider a galaxy at redshift z 
for which we can define the cumulative luminosity function (CLE) and thus determine the 
probability that the galaxy will have an absolute magnitude brighter than M as 

M 

/ (l){M')p{z)f{M')dM' 

PiM\z) = =i , (33) 

/ (f>{M')p{z)f{M')dM' 

— 00 
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where p{z) is the density function for the redshift distribution, f{M') is the completeness 
function which for a 100% complete survey would be 



f{M') 



faint 



f 1, M,'?"s''' < M' < Ml; 

' lim — — li 

(34) 

0, otherwise. 



It follows that the probability density for detected galaxies is given by the partial derivative 
of P{M, z) with respect to M, 

^ dp{M,z) ^{Mj) 

OM Albright(iri) 

/ (t){M')dM' 

Note that the density functions have cancelled thus rendering the technique insensitive 
to density inhomogeneities. Finally, the likelihood is maximised to give 

N 

C = Wp{M,,Zi). (36) 

1=1 

Most commonly a Schechter function is assumed where the parameters that we wish to 
estimate are a and A/* as defined in Equation 9 on page 4. 



4.1 Normalisation, goodness of fit & error estimates 

Although the MLE method has become more popular than other traditional non-parametric 
methods there are aspects not to be overlooked. This approach does not determine the nor- 
malisation parameter 0* of the LF and consequently has to be estimated by independent 
means. The approach originally described by Davis & Huchra (1982) and later adopted 
by e.g. Loveday et al. (1992); Lin et al. (1996); Willmer (1997); Springel & White (1998); 
Blanton et al. (2003c); Montero-Dorta & Prada (2009) incorporates a minimum variance 
density estimator to determine the mean density of objects. The method can be summarised 
as follows. The normalisation can be cast in terms of 

_E£^ (37, 



/ dvs{z)w{zY 

where Ng^^i is the number of galaxies in the sample, w^z) is a weighting function for each 
galaxy defined by the inverse of the second moment of the two-point correlation function 
given by 

A^) = .^-\ V (38) 
1 -h nJ3S{z) 

and S{z) is the selection function for the survey defined within a maximum and minimum 
redshift range, 

S{^) = (39) 
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where the quantity J3 is the integral of the correlation function given by 

J3= drr^^{r). (40) 
The normalisation is then calculated iteratively and the error can be computed by 



on 



ll/2 

(41) 



/ dV(p{z)w{z) 

Davis & Huchra (1982) and Willmer (1997) point out that whilst this method is robust it 
can return a biased estimate if the survey sample has significant inhomogeneities. In a more 
recent paper by Hill et al. (2010) it was commented that further bias may be introduced due 
to incompleteness at higher redshifts resulting in over weighting of 4>*. For more exploration 
into this and other normalisation methods, see Willmer (1997). 

In terms of the goodness-of-fit of the adopted parametric form of the LF, this too, as 
highlighted by Springel & White (1998), is not built into the MLE and, therefore, has to be 
assessed independently. For survey samples that may not so obviously be described by a 
Schechter function, caution should be taken as this implies that nearly any functional form 
could be made to fit a given data-set. Furthermore, the nature of the method effectively de- 
termines the slope of the LF at any point. One can, however, apply a simple minimisation 
test to probe the goodness-of-fit. Aside from this, if the survey sample is not complete near 
the apparent magnitude limit sources close to the limit will be underestimated thus making 
the slope of the LF underestimated (Saunders et al. 1990). 

A standard approach for estimating the relative error on the LF parameters A/* and a was 
adopted by Efstathiou et al. (1988). This involves jointly varying these parameters around 
the maximum likelihood value to find where the likelihood increases by the /3-point of the 

distribution; we have 

ln/: = ln/:„,ax-ix^(^') (42) 

where is the number of degrees of freedom corresponding to Ax^ = 2.30 for la limit 
(68.3% confidence interval) and Ay^=(3.\l for 2cr limit (95% confidence interval). 



4.2 Further extensions 



The STY79 method remains one of the most widely applied LF estimators to date and as 
a result has been modified over the years. For example, Marshall et al. (1983) (hereafter, 
M83) extended its use for quasars by simultaneously fitting evolution parameters with the 
luminosity function parameters. For this they test both pure density and pure luminosity 
models. In their analysis the probability distribution in the likelihood for the observables 
is described instead by Poisson probabilities. The luminosity and redshift space (L — z) 
distribution is gridded such that the likelihood is defined as the product of the probabilities 
of observing either 1 or quasars in each cell such that 

N N 

= n ^(^«' L,)dzdLe-^^''-^'^'^''^^ Yl e-M^. ,i.)d^<ii^ (43) 

i j 

where the quantity \{zi,Li)dzdL represents the expected number of objects in each cell 
in the L — 2: plane. The j index takes into account cells where no objects were observed. 
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This form of the likelihood has proved popular and has been widely applied. Choloniewski 
(1986) adopted the method and applied it to the CfA survey data. More recent examples by 
Boyle et al. (2000) studied the quasar LF in the 2df-QS0 survey and Wall et al. (2008) for 
exploring a sub-millimetre galaxy sample from the GOODS-N survey. As I will discuss in 
more detail in § 7, Christlein et al. (2009) also drew on this approach when incorporating 
photometric redshift estimates into the MLE. 

Saunders et al. (1990) used the STY79 approach not only to constrain the LF but in- 
stead integrate over the comoving volume to determine the radial density field. In this way 
no knowledge of the LF is required. They demonstrated that by parameterising the radial 
density function p(| r j) they can fit it as a step function and obtain the variation on the MLE 
as 

C = T[ (44) 

l}j p{z,)[dV/dz)dz- ^ ' 

Heyl et al. (1997) generalised STY79 by constructing a statistical framework to explore 
how the LF evolves with redshift. This generalisation also allowed for the combination of 
multiple samples (see also Avni & Bahcall (1980) extension of V/Vma,x in § 5.2). 

For the study of the LF as applied to galaxy clusters Andreon, Punzi, &. Grado (2005) 
extend STY79 by developing a likelihood approach that retains the normalisation and ac- 
counts for Poisson fluctuations and is cast in the context of two density probability func- 
tions: one describing the signal (the cluster LF), the other accounting for background galaxy 
counts from the observations of many individual events (the galaxies luminosities), without 
knowledge of which event is the signal and which is background. Given j data-sets of e.g. 
cluster 1, cluster 2... etc.. each comprising of Nj galaxies, the likelihood is maximised such 
that 

ln£= I E Inp.-^A (45) 

clusters, j \ galaxies, i J 

where pi = p{mi) is the probability of the ith galaxy of the jth cluster to have an apparent 
magnitude jn^. The quantity s is the expected number of galaxies given the model, evaluated 
by 



J m 



(f>{m)dm, (46) 



where m][^-^ and mf^^ are the respective bright and faint limiting magnitudes of the jth clus- 
ter data in this example. In their analysis the LF is modelled as a convolved power-law and 
Schechter function. The goodness-of-fit was determined adopting the test. In Andreon 
(2006) and Andreon et al. (2006) they extend this approach within a bayesian framework 
and apply a Markov Chain Monte Carlo algorithm to constrain the LF parameters (MCMC 
Metropolis et al. 1953). See also e.g. Andreon et al. (2008); Andreon (2010) for further ap- 
plications of the method. 

In § 6 further applications of the MLE to bivariate distributions are discussed in greater 
detail. 



5 The Traditional Non-Parametric Approaches 

One of the main difficulties in constructing an accurate LF from flux-limited survey samples 
is the issue of completeness. I have discussed some of the major problems with galaxy detec- 
tion. However, in general terms we are hindered observationally by the notorious Malmquist 
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bias effect. This effect means we are biased to observe intrinsically brighter objects at higher 
redshifts and observe only the fainter objects over smaller volumes nearby. In this section I 
will discuss all the various non-parametric weighting schemes that have been devised over 
the years to correct for this bias. 



5.1 The classical approach 

The classical method, as coined by Felten (1977), represents the first rudimentary binned 
number count approach to determining the LF and was initially developed and applied by 
e.g. Hubble (1936b), van den Bergh (1961), Kiang (1961), and Shapiro (1971). However, 
as pointed out in BST88 the method was not formally introduced until the publications of 
Christensen (1975), Schechter (1976) and Felten (1977). 

The underlying assumption of the method is that the distribution of sources within the 
data-set in question is spatially homogeneous i.e. with no strong large-scale clustering. From 
this starting point we count the number of galaxies N within a volume V such that 

N 

4>=y (47) 

The volume, V{M), is calculated for the maximum distance that each galaxy with an abso- 
lute magnitude. Mi, could have and still remain in the sample. As an example, Felten (1977) 
applies the following expression (neglecting A'-corrections) to calculate the volume, 

y (M) = ^TT exp[0.6(mii,„ - M, - 25)] (48) 

jp^na 1 1^^ ^2 (0.6a In 10 CSC femii, 
(0.6a In 10) — 



where mii„i is the apparent magnitude limit of the survey, a and bnjin are related to the 
directional-dependent galactic absorption calculation, and E2{x) is a second exponential 
integral (Abramowitz &. Stegun 1964, Chap 5). 

The number of galaxies. A*', within the absolute magnitude limits of the survey, M, is 
binned into a histogram (see Felten 1977; BST88) with each bin divided by V{M) to convert 
the histogram to units of mag~^Mpc~^ and return a differential estimate of the LF, 0(A/). 

Whilst this method is relatively straightforward to apply, its basic assumption of homo- 
geneity is well understood to be a handicap. At the time when galaxy surveys were shallow 
it was common practice to exclude clustered regions such as the Virgo cluster and members 
of the Local Group to try and avoid biasing in the shape and thus the parameters of the LF 
(Felten 1977). Also, Felten (1976) showed mathematically that the classical test gives a bi- 
ased estimate of the LF and provides expressions for the fractional bias and the variance of 



5.2 The l/Knax and V/Vmax estimators 

A natural development from the classical approach is now the famous F/Vmax test first de- 
scribed by Kafka (1967); Rowan-Robinson (1968) but more formally detailed and applied in 
the much celebrated paper by Schmidt (1968) to assess the uniformity and the cosmological 
evolution of quasars at high redshift (see also Schmidt 1972, 1976). As with the classical 
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Fig. 4 The construction of the traditional Schmidt (1968) V/Vma,x test. The basic construction of the method 
considers the ratio between the volume V, in which a galaxy is observed to the volume Vmax, the maximum 
volume the said galaxy could occupy and still be observed. 



method, V/Vmax assumes spatial homogeneity. F/Vmax is essentially a completeness esti- 
mator and Fig. 4 illustrates its construction. The basic principle of the test is simple and is 
defined by considering two volumes: 

- V, the volume of the sphere of radius R, where R is the distance at which a galaxy was 
actually detected, compared to 

- Vmax, the maximum volume within which a galaxy could have been detected and still 
remain in the catalogue in question. Thus, Vmax is the volume enclosed at maximum 
redshift, 2:max at which the galaxy in question could still have been observed. 

Assuming that the distribution of objects within the survey sample is homogeneous, then it 
follows that the value V/Vmax is expected to be uniformly distributed in the interval [0,1]. 
Thus, for a complete sample with no evolution V/Vmax has expectation value 

\ vmax / ^ 

with an often quoted statistical uncertainty of l/(12Af)^/^, where A'' is the total number of 
objects in the sample (see e.g. Hudson & Lynden-Bell 1991). In reality, the value calculated 
from V/Vmax for a survey will deviate from 1/2. By how much the value deviates from 1 /2 
is usually considered to be either a signature of incompleteness (e.g. Malmquist bias) and/or 
an indication of evolution: a value that is greater than 1/2 would imply a density evolution 
where galaxies were more numerous in the past, where as a value less than 1/2 would imply 
that galaxies were less numerous in the past. 

In the same paper, Schmidt also outlined a variation of this statistic that could be used 
to estimate the LF under the condition where the maximum distance r-max an object could 
have and still be included in the sample was independent of its direction. 



(50) 
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Once again it was Felten (1976) wlio would dub this estimator as the 'Schmidt's estimator'. 
The comoving volume can be determined by evaluating, 



where is the solid angle of the survey, is the luminosity distance as given by Equa- 
tion 11, and Zjnin and Zmax is redshift range of the sample. The quantity A{z) is related to 
the transverse comoving distance and defined as 



where flm, fik and f}y[ are, respectively, the matter density, curvature and cosmological 
energy density constants. 

If one assumes Poisson fluctuations and homogeneity within each bin then a standard 
approach (see e.g. Condon 1989) to error estimation is simply computing the rms on each 
bin. 



As we shall see in the following section, if these assumptions breakdown Bales (1993) pro- 
vided an more rigorous modified approach to error estimation. 

5.2.7 Development and variations of the Vmax estimator 

Although strictly speaking l^/Vmax is a completeness estimator I have included its develop- 
ment in this section as it is so intimately linked with the 1 /Vmax estimator for constructing 



Since its inception, 1/Vmax has remained a popular estimator for determining lumi- 
nosity functions and as a probe of evolution, most likely due to its simplicity and ease of 
implementation. However, as we will see in § 8 a survey sample with strong clustering 
properties will bias the slope of the recovered LF. Nevertheless, the Vmax estimators have 
evolved, been improved and refined over the years to accommodate the many different types 
of survey that have steadily grown in size and complexity. Selected below is a summary of 
some of the most significant developments of the Knax method. 

Huchra & Sargent (1973) were the first to extend its use to galaxies from the Markarian 
lists I to IV (see Markarian 1967, 1969a, b; Markarian, Lipovetskij, & Lipovetsky 1971) and 
perform V/Vmax as a completeness test whilst including the Virgo Cluster and the Local 
Group. They showed that the effects of including such clusters had a minimal impact on 
their results. Furthermore, they calculated the space density <P{M) via Schmidt's 1/Vmax 
estimator, where they summed over all galaxies within absolute magnitude intervals. 

Felten (1976) made an extensive comparison of 1/Vmax with the classical test. This paper 
derives a generalised version of 1/Vmax between absolute magnitude ranges Mi < A/ < 
M2 to give 




(51) 



A{z) = y^nm{i + z)3 + %(i + 2)2 + n^, 



(52) 
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Fig. 5 Avni & Bahcall (1980): The top panel shows the construction of the generalised y/Vmax for two in- 
coherent overlapping samples (B and D) into two region independent samples, (B-C) and D (see Equation 55). 
The middle panel shows the construction of Ve/Va for a coherent sample constructed from two overlapping 
samples (B and D). The shaded region in the bottom panel represents Ve for the case z < z^n^{F) in 
Equation 57. 
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and shows that it is superior to that of the classical estimator by being an unbiased estima- 
tor. 



Avni & Bahcall (1980) generaUsed V/Vmax for multiple samples for two distinct cases: 

1. Firstly, for combining independent multiple samples that are still physically separated. 

2. Secondly, for combining independent samples in which the individual samples are over- 
lapping. 

In the first scenario they consider complete 'incoherent' samples which do not overlap on 
the plane of the sky that could either be initially non-overlapping, or could be constructed 
from overlapping samples as illustrated on the top panel in Fig. 5. The term 'incoherent' 
refers to combining samples in which one remembers for each object its parent sample. 

In this particular case the V/Vmax statistic can be constructed from overlapping samples 
dividing the space into two distinct volumes (B-C) and D. For this method they show that a 
combined sample average of V' /V'max is given by 

V \ Nb-c / V \ ^ iVp ^33^ 



'l^max /b_c,d ^B-C + A^D \ V^a^ / Nb-C + No \ K 

where V' represents the density-weighted volume, A^b-c is the number of objects in sample 
(B-C) and Nd is the number of objects in sample D. 

The second scenario considers the simultaneous analysis of independent complete 'co- 
herent' samples in which the individual samples are physically joined and a new statistic, 
Ve/Va, is constructed (see illustration in the middle panel of Fig. 5). By this description, 
'coherent' refers to the method of combining independent samples. Here, a new variable Va 
is defined as the density-weighted volume available to an object for being included in the 
coherent sample. This new volume is defined as 

Va[F^] = ^^V'[zl,^{F,)] + ^v'lz^aAm (56) 

where Ob-c and Qd are the solid angles subtended on the sky and Ei is flux of the object. 
The second new variable is defined as the density-weighted volume enclosed by an object 
in the coherent sample and is given by 



Vl[z„F,] 



'^V'{z,) + q^V'{z,), z,<z^,^{F,) 
^V'iz^^aAF.)] + ^V'iz,), z, > Z^^^F,) 



(57) 



This first case in Equation 57 is illustrated in the bottom panel of Fig. 5. This leads to the 
sample average of V^/V'a being defined as 

vi/-N^\^\^mr] ^^^^ 

where N^ is the total number of objects in the two combined samples. 

Hudson & Lynden-Bell (1991) recast V/Vmax for analysis of the diameter function of galax- 
ies. Therefore, for diameter-limited catalogues which have both a maximum and minimum 
diameter cut-off they show that the completeness test can be written as 
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where 9 is the major diameter of a given galaxy, 6ii^ is the lower diameter limit of the sur- 
vey and 6max is the maximum diameter cut-off of the survey. 

Bales (1993) provided a more rigorous and generalised treatment for estimating errors for 
surveys sampling smaller volumes than the quasar samples examined in Schmidt's original 
work. Consequently, the error estimation laid out by Bales takes into account effects from 
strong clustering of galaxies and shot noise. The key to this approach draws from Peebles 
(1980) by incorporating the variance in the number of galaxies within successive i slices 
of redshift and absolute magnitude of a parent sample 

Var(iV,) = ((iV, - {N,)f) = J ^ir)dV + j j ip{r^)ip{r^)i (In - ral) dV^idFa, 

(60) 

where r is a position vector and the integrals are over the comoving volume subtended by 
the solid angle of the sample bounded by the redshift limits. The 2-point correlation function 
is given by £,{r), which accounts for the error due to clustering and the selection function, 
and ^p{r), taking into account the contribution of shot noise. Bales goes on to show that the 
total error in the LF is then estimated by 

(6.) 

In the same paper the generalised 1 /Vmax estimator introduced by Felten (1976) is extended 
to examine the evolution of the LF as a function of redshift. Similarly, van Waerbeke et al. 
(1996) looked specifically at the effects of pure luminosity evolutionary models on QSOs 
via the Vmax estimator to constrain cosmological parameters. 

Qin & Xie (1997) generalised the now familiar Schmidt notation in terms of a new statistic 
called n/rimax that is applicable to any kind of distribution of objects. This, therefore, would 
be an improved measure of the traditional V/Vmax test where the estimator is weighted dif- 
ferently and the distribution in question is assumed to homogeneous. This fitting technique 
demonstrated that if the adopted LF is correct then the distribution of n/nmax is uniform on 
the interval [0,1] , 

f<P{M,z)dV{z) 

"Mf^ _ ^ ^ (62) 



[M, (M)] 2„,ax(Af) 

/ <P{M,z)dV{z) 



and the authors showed that its expectation value (n/wmax) is 1/2. 

Following from this, another statistic, o/omax, based on the cumulative LF and indepen- 
dent from n/nmax was introduced by Qin & Xie (1999): 

M 

J <P{M, z)dM 
o{M, z) ^ 

0max(2) M^^^iz) ■ ^ 

/ ${M,z)dM 



This statistic combined with that of Qin & Xie (1997) are designed to provide a sufficient 
test for any adopted LF form. In the latter paper they apply both estimators to AAT sample 
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data from the UVX survey (Boyle et al. 1990). 

Page & Carrera (2000) improved the method to take into account systematic errors intro- 
duced for objects close to the flux limit of a survey. As they point out, for evolutionary stud- 
ies of galaxies the traditional approach, as extended by Avni & Bahcall (1980) and Bales 
(1993), is very common (see e.g. Maccacaro et al. 1991; Ellis et al. 1996) but can distort the 
apparent evolution of extragalactic populations. Through the use of Monte Carlo simula- 
tions, with a sample of 10,000 objects and simulating an unevolving two-power law model 
X-ray LF, they compare the l/Vmax estimation of the differential LF given by 

to their improved binned approximation of the (pest, which assumes that <f) does not change 
significantly over the luminosity and redshift intervals AL and Az, respectively, and is de- 
fined as 

N 

(t>est = , (65) 

/ / {dV)/{dz)dzdL 

where TV is the number of objects within some volume-luminosity region. 

Miyaji, Hasinger, & Schmidt (2001) extend the Page & Carrera (2000) method for the study 
of active galactic nuclei (AGN). To help reduce biases from binning effects they employ a 
parametric model to correct for the variation of the LF within each bin. With this weighting 
scheme, the binned LF is calculated by 

lyobs 

f(L„.,)=^'{i.,^.r°'^'^^, (66) 

where Li and z^ represent the luminosity and redshift at the centre of the ith bin. ${Li, z;)™ 
is the best-fit model evaluated at Li and zi. N"^^ is the number of observed objects in each 
bin and is the number of objects estimated from the best-fit model. The method still 
assumes homogeneity within each bin and obviously requires that the model accurately de- 
scribes the data. However, it is remarked that Poisson statistics can be used to compute the 
errors. This approach has been more recently applied by e.g. Croom et al. (2009b) for their 
studies of quasi-stellar objects (QSOs). 

Chapman et al. (2003) uses 1/Vmax in order to construct the bivariate luminosity function 
(BLF) <P{L, C), in luminosity L and colour C. This method is discussed in greater detail in 
§ 6 which reviews various BLF techniques. 

To account for low precision photometric redshift estimation Sheth (2007) incorporated the 
dN/dzpfioto distribution into Knax by adopting a deconvolution technique. This method will 
be detailed in § 7. 

Tojeiro & Percival (2010) provide a variation of the traditional F/Knax estimator to probe 
evolution of the Sloan Digital Sky Survey (SDSS) DR7 Luminous Red Galaxies (LRGs) 
(see Eisenstein et al. 2001, for main sample selection). As pointed out by the authors, recent 
studies of evolution in LRG samples employs a procedure that constructs pairs of samples 
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Fig. 6 Image courtesy of Tojeiro & Percival (2010). In the above schematic three populations (i.e galaxies 
with similar colours and luminosities) are considered. For Va in the population 1 scenario galaxies are given 
a weight equal to unity. However, a galaxy in Vb in this population is down-weighted. If a galaxy is in 
the population 2 sample and therefore confined to one slice only it would be given a weight of zero. The 
population 3 galaxies are those that can been see throughout the entire survey sample. This represents a 
volume-limited sample where the corresponding weight is always unity. 



- one being at high redshift and the other at low redshift. However, matching the individual 
galaxy properties between the two samples traditionally requires the removal of galaxies 
that could not have been observed due to a varying selection function (see e.g. Wake et al. 
2006). To overcome this, a weighting scheme is constructed to the two redshift slices: Va 
(for high redshift) and Vb (for low redshift). The weighting scheme down-weights galaxies 
to keep the total weight of each galaxy population the same in the different redshift slices. 
Therefore, each galaxy in Va or Vb is, respectively, weighted by 

V ( V^ ■ V^ ) 1/ ( V^ ■ V^ ) 

max, I ( ) max, 2 ( ) 

This is illustrated in Fig. 6. However, as the authors carefully note, this approach provides 
a weighting scheme only and is not a completeness estimator unlike the traditional Schmidt 
test. As such, incompleteness may still be inherent in the parent sample that is under test. 
Moreover, for a survey constructed by a magnitude-limited sample, the Schmidt estimator 
would be applied instead. 

Lastly, there has been a more recent paper by Cole (2011) in which he generalises l/Vmax 
with a density corrected Vdc.max estimator that takes into account effects from density flue- 
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tuations within a given volume. The development of such a method was used to provide 
an algorithm which generates magnitude-limited random (unclustered) galaxy catalogues, 
which take into account both the correlation function and galaxy evolution. 

5.3 The C~ method 

As already discussed, the drawback in the use of the 1/Vmax is the assumption that the dis- 
tribution of objects is spatially uniform. The increase in the number and variety of redshift 
surveys over the years has confirmed that galaxies have strong clustering properties. Natu- 
rally, this can introduce a bias in constructing the differential LF. However, it was not long 
before alternative approaches were developed that could circumvent this problem. 

The method was introduced by Lynden-Bell (1971), where he applied it to the quasar 
data of Schmidt (1968). It is a maximum likelihood procedure adapted from the survival 
function and does not require any binning of data. Instead, the method estimates the 
cumulative luminosity function (CLF). It has the advantage over the classical and 1/Vmax 
methods as it does not require any assumptions about the distribution of objects within the 
data-set. Furthermore, as remarked by Petrosian (1992), all non-parametric methods are 
essentially variations on the method in the limiting case of having one galaxy per bin. 
Its mathematical properties have also been examined rigorously in the statistical literature 
(see Woodroofe 1985; Chenetal. 1995). 

In practice the method recovers the CLF without normalisation with the use of a weighted 
sum of Dirac (5-functions (thus assuming no errors in magnitude). As we shall see, to account 
for errors in magnitude one can add a smoothing kernel into the procedure. To understand 
how the method works let us firstly consider the following. In an ideal scenario where one 
is not restricted by observational constraints such as faint and bright apparent magnitude 
limits, constructing the cumulative luminosity function (CLF), <P{M) would be a relatively 
trivial task. However, these limitations, in reality, lead us instead to observe a sub-population 
of galaxies sampling a CLF which we can refer to as X{M) (Subbarao et al. 1996; Willmer 
1997). Thus we find that 

d'P dX 

-¥ > -X- (^8) 

Let us now consider this observed distribution of galaxies in terms of the absolute mag- 
nitude, M, and distance modulus, Z, plane M — Z, and assume (as with all the methods 
described so far) separability between M and Z (see Fig. 7 ignoring the red coloured mark- 
ings for the moment). From this figure we can see the sharp faint apparent magnitude limit 
m[j^-, of the survey indicated by the diagonal line. We can now write the probability density 
function of M and Z for the observable population as 



where, 0{m\:^^ — m) is a Heaviside function describing the magnitude cut, mjj^ defined as 



dP = p{Z)dZ (j)[M)dM ©(mfi 



lim 



m) 



(69) 




(70) 



However, to determine ${M) given the observed X{M) we can construct a subset re- 
gion of X we call C~ (M) where we are now able to obtain 
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Fig. 7 The construction of the method introduced by Lynden-Bell (1971). Cfc defines a region for 
each galaxy (A/fe, Zf^)m the sample. All the galaxies in this region are counted excepted for (Mj., Z^). Mmin 
is imposed by the brightest galaxy in the survey data. Similarly, Z^i^ is the minimum distance modulus 
defined by the nearest galaxy at redshift 2,nin ■ 



In this scenario, the integrated CLF can be written in form, 

v— oo 

where, is a normahsation factor and X{M) represents the parent set of points within 
which one constructs the C~ {M) subset. However, we require the differential luminosity 
and density distribution functions which can be represented by a series of Dirac 5 functions, 
respectively, given as 

N 

<P{M) =J2A5{M - M,), (73) 




N 

p{Z) = Y,p,&{Z-Z,), (74) 

i=l 
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where tpi and pi are the respective step sizes. The distance modulus, Z, is calculated by 

Z = m-Af = 51ogio(di) + 25 (75) 

where is the luminosity distance to the object and m is the apparent magnitude. To then 
construct the CLF the {Mi^,Zf.) data are firstly sorted from the brightest to the faintest 
galaxy such that A/^+i > for fc = 1, A^, and a region on the plane for each galaxy 
located at A/j, defines the C~(Affc) function such that 



Ck = C-{Mk), k = l,...,N{ 



,<M<M', 

(76) 

< Z < Z' 



as illustrated in Fig. 7. According to Jackson (1974), the superscript 'minus' is to emphasise 
that the point at (Afj., Zj.) is not included when evaluating C~ (Mj.). The coefficients of the 
LF are determined from the recursion relation. 



V'fc+i = v^fc „_ . (77) 



Therefore, the CLF is given by, 

M 



(k{M)dM = ^i H , (78) 



In Equation 78 (Ci +1)/Ci is set to unity so that the product begins with k = 2 (Choloniewski 
1987; Takeuchi etal. 2000). 

5.3.1 Extensions to the method 

Although Jackson (1974) extended the original method to account for the combining of mul- 
tiple data-sets and deriving suitable error estimates, the method remained limited to deriving 
only the shape of the probability density function. However, Choloniewski (1987) revisited 
and improved the method by not only simplifying it, but by properly normalising the LF 
and estimating the density evolution of galaxies simultaneously. However, Choloniewski's 
improved version has only been applied sporadically over the years (see e.g. Warren et al. 
1988; Takeuchi et al. 2000; Cuesta-Bolao & Serna 2003; Takeuchi et al. 2006). A paper by 
Schmitt (1990) extended the method for samples with multi-flux limits. 

Caditz & Petrosian (1993) introduced a smoothing non-parametric method based on a 
Gaussian kernel, which replaces the 5-function in Equation 73 and 74 with 



" 1 



-(i-a;,) 



(79) 



where x represents the observables, K{x) is the kernel function, d is the number of measured 
parameters for each object. The parameter is a free parameter which defines the magnitude 
of the smoothing scale. The kernel therefore replaces the (5-function distributions given in 
Equations 73 & 74 which can otherwise limit the use of C towards the faint magnitude 
limit of a survey. 
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Subbarao et al. (1996) extended the method for photometric redshifts by considering, for 
each galaxy, the probability distribution in absolute magnitude M resultant from the photo- 
metric redshift error. By adopting a Gaussian error distribution for the function z*(rni, M), 
the redshift for the ith galaxy with an apparent magnitude m^, they showed that for a com- 
plete magnitude-limited sample the defined region C~ {M) is now given as 



C'{M) = 0.5^ 



erfc ( _ erfc ^ 



(80) 



where erfc(a::) is the complementary error function. 

Willmer (1997) included Lynden-Bell's approach applied to simulated data and the 
CfA - 1 survey (Huchra et al. 1983) when comparing various LF estimators, and is discussed 
in more detail in § 8. 

Zucca et al. (1997) provided a variant on the original Lynden-Bell version termed the 
estimator, which they then incorporated into their "Algorithm for Luminosity Function" 
(ALF) tool (see e.g Ubert et al. 2004, 2005; Zucca et al. 2009). Essentially Equation 77 is 
redefined such that for each fc*^ galaxy, the contribution to the LF is given by 

where, galaxies for the ip{Mj) summation are sorted from the faintest to the brightest and 
C*" (A/^,) is the number of galaxies in the region, 

M < Mfc, 

C7+(Mfc), k = l,...,N{ (82) 

2'min < Z < ^max,fe(A'ffc) 

Thus to obtain a binned version of the LF, the quantity tp{Mj.) is estimated for each galaxy 
and the returned values binned in absolute magnitude. 



5.4 The 4>/$ method 

Originally introduced by Turner (1979) and Kirshner et al. (1979) the method (as coined 
by BST88) is a natural progression from the classical method (§ 5.1) that returns a binned 
estimate of the LF. For a magnitude-limited sample we calculate the ratio of the number 
of galaxies in the interval dM, N(dM) to the total number of galaxies brighter than M, 
N{< M) within the maximum volume assuming a complete sample: 



N{dM) _ dN{< M) _ ^{M)p{z)dMdV _ 4>{M)dM ^, ^^^^^^^^^ (gg^ 



iV(< M) N(< M) M $(M) 

J <t>{M')p{z)dM'dV 

— oo 

where p{z) is the density function and <P{M) is the integrated LF. It is clear from this equa- 
tion that the density functions cancel, thus rendering the estimator independent of the distri- 
bution of galaxies. This estimator has been further developed slightly - Davis et al. (1980), 
Davis & Huchra (1982) to bin the data in equal distance intervals. However, as shown in 
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de Lapparent et al. (1989) the approximation in Equation 83 introduces a bias in the determi- 
nation of the LF for large dM. To avoid this bias it has been common place to instead assume 
an analytical form for the LF as in Turner (1979), Kirshner et al. (1979), Davis & Huchra 
(1982) and de Lapparent et al. (1989). So although by its original construction this estima- 
tor is non-parametric, subsequent applications have found its place to be more useful as a 
parametric one. 



5.5 The step-wise maximum likelihood method 

One of the first embodiments of the step-wise MLE approach, was presented by Nicoll & Segal 
(1983) and can be thought of as a binned version of the Lynden-Bell's method. This was 
applied in their analysis of chronometrical cosmology and also considers variations of pro- 
gressive truncation in apparent magnitude as well as multivariate complete samples. 

A more advanced version of Nicoll and Segal's approach by Choloniewski (1986) ap- 
plied the same Poisson probability distribution in the MLE of Marshall et al. (1983) (see 
section 4). In this paper, the data are projected on the absolute magnitude M and distance 
modulus Z plane and divided into equal sized cells such that 

MG [M,_i,M,], i = l,...,A, (84) 

ze[Zj-i,z,], j = i,...,B, 



The likelihood function can be represented as 



■1=1 j=i '3 

where 



= ^<l>i{M)pj{Z)dM dZ, (86) 

where Nij is the number of galaxies in the cell, h is the mean density of the sample, 
${M) is LF, and p{Z) is the density function where the separability between <P{M) and 
p{Z) is assumed. 

The method which is would become known as the step-wise maximum likelihood method 
(SWML) was introduced by Efstathiou, Ellis, & Peterson (1988) (hereafter EEP88) and rep- 
resents the non-parametric version of the STY79 method. As its name implies the technique 
does not depend on an analytical form for 4>{M). Instead the LF is in effect parameterised 
as a series of Np step functions allowing us to define the following initial setup: 

#(A/) = 0fc, Mfc- — <M<Mfc + — , (87) 
where. A; = 1, Np 
The differential LF can then be expressed as 

N 

cl>{M) = Y,(l>,W{M,-M), (88) 
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where W{x) represents two window functions, 

lO, otherwise. 

Therefore, it can be shown that the expression for the step-wise likelihood is given by 

N,.s E W{Mi - Mu)4>i 
^(W«}.=i / I {Mk}i=i k) = n -JT^ (90) 

*^=i E cbiH{MiU^k) - Mi)AM 
1=1 



where 



H{x) = < 



0, x< -AM/2, 

{x/AM + 1/2), ~AM/2<x< AM/2, (91) 

1, x> AM/2. 



In EEP88 the errors on the LF parameters are assumed to be asymptotically normally dis- 
tributed giving a covariance matrix, 

cov(0fe) = (92) 

where I{cf>k) is the information matrix (see Eadie et al. 1971). Strauss & Willick (1995); 
Koranyi & Strauss (1997) noted two drawbacks of the method. The first concerns discreti- 
sation of the LF using step functions. A bias is introduced into the selection function due to 
having discontinuous first derivatives. The authors instead suggest interpolating through the 
steps and then calculating the selection function to reduce this bias. The second drawback 
is the sensitivity of the LF to the choice of bin size. In Koranyi & Strauss (1997) they show 
by example, that if the total number of bins is too small, this can dramatically underestimate 
the faint-end slope of the LF. 

Heyl et al. (1997) extended the use of the SWML method by generalising it in a similar 
way as Avni & Bahcall (1980) did for V/Vma-x by combining various surveys with different 
magnitude limits, coherently. Moreover, this extension also provided an absolute normalisa- 
tion and was used to probe density evolution in the LF by spectral type. In the same paper, 
they compare this method against the 1/Vmax estimator (see § 8). 

Springel & White (1998) also explored evolution and provided a variation of the method 
that instead models the selection function as a series of linked piecewise power laws as 
opposed to step functions as in Equation 88. 



6 Bivariate Luminosity Functions 

The move beyond the standard LF analysis has seen constructing bivariate LFs (BLF) which 
have proven useful to further constrain galaxy formation and evolution theory. Much of the 
work described in this section represent natural progressions of the STY79 and the EEP88 
MLE estimators that have combined various observables such as galaxy radius, colour, 
galaxy diameters and surface brightness with space density. A more general approach re- 
cently developed by Takeuchi (2010) shall be discussed in greater detail in § 9. 
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6. 1 Galaxy radius 



Choloniewski (1985) constmcted a BLF with galaxy radius, ${M,\ogr). In this scenario 
the number of galaxies in the interval dMd{logr)dm can be written as 



n(M, log r,m) = <P{M,logr)f{m)p{Z)W°-^'-^\ 



(93) 



where Z = m — M is the distance modulus, /(m) is a completeness function which de- 
scribes selection effects and p{Z) is the density function. The likelihood function is then 
defined as 



/ -J +00 +CXD 

J dM J dm,^(M)/(m)p(m-M)100-6(™-A/) 



+ ^lnS'(M„log(r,;)), 



(94) 

The maximisation of the first component of the likelihood gives the shape of the LF 4>{M). 
However, the second component S(Af, logr) requires performing the MLE to the distribu- 
tion of galaxies in the M — log r plane which has the assumed form. 



S{M, logr) 



1 



exp 



(logr -I- aM -h \)Y 
2^^ 



(95) 



6.2 Colour 

Chapman et al. (2003) described a parametric approach to fit a bivariate distribution, ^ (L, C), 
in luminosity L and colour C, that utilises the 1 /Knax estimator. This paper is concerned 
specifically with the local infrared-luminous galaxies from the IRAS 1 .2 Jy sample selected 
at 60 [im (see Fisher et al. 1995, for full survey description). They use the IR colour distri- 
bution defined as the _R(60, 100) = Seofim/Sioofmi flux ratio (where 5* refers to the wave 
band) to explore colour- added evolutionary models. 

Recalling that the Knax estimator can be expressed as 

<p{L)AL = J2y^, (96) 



max.! 



where AL is the luminosity range within which we have a number density of galaxies, <P{L). 
To reiterate, Vmax.i represents the maximum volume that the ith galaxy can be located in 
and still be detected. 

They find that the population of sources in the IRAS sample is represented best by a 
lognormal distribution such that 



G{C)=exp 

2 \ ac 

The colour distribution is modelled analytically as 



(97) 



R(^)=C.y,(l + J^) Vl + ^V, (98) 



^ioo;="*n'^W v^-^j 

where Ltir is defined as the total infrared luminosity. S and 7 represent the slopes of the 
faint-end and bright-end of the LF, respectively. 
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The final BLF convolves the LF, <Pi{L), modelled as a two-power law with the colour 
function, $2{C), as defined above to give 

•P{L, C)dLdC = *i(L) X <p2{C)dLdC 

(l-a) 



(99) 



p,x — X 1 + — 



X exp 



1 f C-Co 
2\ ac 



dLdC, 



where the C is calculated using the observed IRAS broad-band fluxes. See also, Chapin et al. 
(2009), Marsden et al. (2010) for further extensions to this method. 



6.3 Galaxy diameters 

The SWML has now become a very popular method for determining the LF and Sodre & Lahav 
(1993) extended it (and the STY79 estimator) to a bivariate distribution of magnitudes and 
galaxy diameters (see also Santiago et al. 1996). In this variation the conditional probability 
of finding a galaxy with a diameter D and absolute magnitude M can be written as 

P{D, M) = P{M)P{D\M) = P{D)P{AI\D). (100) 

The bivariate distribution of diameters and magnitudes is then expressed as 

<F{D, M) dDdM = cl){D) ip{M\D)dDdM, (101) 

where 0(-D) is the diameter distribution function and if{M\D) gives the luminosity distri- 
bution for galaxies with diameter D. In terms of the step-wise likelihood, the distribution 
function of Equation 101 is now parameterised as A''^! bins in diameter and Nm bins in 
absolute magnitude such that, 

ifp,Af) = *',fe, j = l,...,ND, k = l,...,NM. (102) 

It can be shown that the log likelihood is given by 

N Nd Nm 

ln(£) = ^Y1Y1 ^^Jfc Hl'jkCv{D^/v*,M, + Z,)] (103) 

i-l j=l k=l 

N /No Nm \ 

^Y.^'^iY.Yl Hum>Pim^D^M J + constant, (104) 

where 

{1, if -{AD)/2<x<{AD)/2 
and - {AM)/2 <y< {AM)/2 (105) 
0, otherwise. 

In a general way the quantity Cy defines the velocity completeness function Cyifi"^ ^rn) as 
the fraction of galaxies with measured velocities v and with apparent diameters between 9 
and + do, and apparent magnitudes between m and m + dm. Zi is the distance modulus 
v* = _Di/6'™. Balletal. (2006) adopted this approach to construct the BLF with surface 
brightness. 
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6.4 Surface brightness 

Cross & Driver (2002) (see also Driver et al. 2005) extended the Sodre & Lahav (1993) ap- 
proach and that of EEP88 to determine the joint luminosity-surface brightness LF and thus 
recover the bivariate brightness distribution (BBD) using the Millennium Galaxy Catalogue 
(MGC) survey data. In this scenario they consider magnitude and surface brightness limited 
data where the effective absolute surface brightness is defined as 

if =m + 2.51og[27r(r-°)2] - 101og(l + z) - k{z) - e{z), (106) 

where is the true observed half-light radius and k{z) and e{z) are the respective K— and 
evolution corrections. To construct the bivariate SWML for this case they firstly define an 
observable window function given by 

0,{M,ff) = { and Ml^igh.^ < < MiW (107) 
0, otherwise. 

Taking the Sodre & Lahav (1993) approach they define Wij^ to weight each galaxy to ac- 
count for incompleteness such that 

m{Qz>3)' " ^'-'j - ^^^^ < ^'-'j + 



and -4f<f,l<f,l + 4f (108) 



0, otherwise, 

where Ni is the total number of galaxies lying in the same m~ ^f bin as galaxy i.Qz defines 
a redshift quality, where a Qz > 3 equates to a reliable redshift measurement. The quantity 
Ni{Qz > 3) is thus the number of galaxies which known redshifts in the same bin. Finally, 
to account for the fraction of the Mj — /x^ bin which lies inside the observable window of 
galaxy i they define the visibility function to be, 

Mj+AM/2 + 

Hvk = j^^^^ J dM I dM'0^(m,M'), (109) 

Mj-AM/2 til-Afj.'/2 

They then fit a functional form characterising the joint luminosity-surface brightness (BBD) 
distribution given by 

<,(M,M^) = Miii2<^a0°-^(^^*-^^)("+i)e-i°°"'"*-"' (110) 
\/27r(T^= 



X exp <; - 

where the upper part of Equation 110 has the usual Schechter function parameters and the 
lower part of the equation has additional parameters, /i"^* , cr^e and /?, to characterise the 
surface brightness data. 

Section 9 examines in more detail a new technique to the BLF by Takeuchi (2010) which 
offers a more generalised approach by using the copula to connect two distribution functions 
and the Pearson correlation coefficient to explore the correlation between the two. 
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7 Methods to Account for Large Uncertainties in Redshift Estimation 

The use of photometric redshifts (photo-z) is playing a more central role in probing the 
cosmological model (see e.g. Blake & Bridle 2005; Banerji et al. 2008). Measurements can 
be performed quickly and to high redshift and are therefore specifically suited to large galaxy 
surveys. However, as previously discussed, the resulting precision is substantially worse than 
measuring them spectroscopically. Consequently, the ability to accurately constrain the LF is 
severely impeded. In what follows, I discuss some of the modifications made to the standard 
LF estimators to incorporate photo-z measurements that are not precisely known. 



7.1 Smoothing kernel 

Already discussed in § 5.3 was the Subbarao et al. (1996) Gaussian error distribution which 
was incorporated into the method given by 

X{M) = O.S^erfc (^£>!l:i^I^^ , (HI) 

which led to estimating the C~ function as shown in Equation 80. This essentially trans- 
forms the traditional Lynden-Bell 'discrete' approach into a more 'continuous' one, where 
enws in the redshift distribution coupled with /^-corrections can now be represented by a 
smoothing function and integrated over the redshift probability distribution for each galaxy. 
To assess the errors and explore biases they use bootstrapping Monte Carlo techniques. 



7.2 Deconvolution - a Knax generalisation 

A paper by Sheth (2007) revisited the Schmidt Wax estimator and extended its use for 
surveys with measured photo-z by casting it as a deconvolution problem. To begin with the 
probability of estimating a redshift ze given its true value zt' is given by 



dNe{ze) _ f , dNjzt) 



dze 



f dNjzt) , I . 

/ dz — p{Ze\zt). 



(112) 



By now considering the generalised case in which a catalogue has both a minimum limiting 
volume, Vmin for which an object would be too bright to be included in the catalogue, and the 
usual maximum volume, Vmax, the number of galaxies, A'^, with true absolute magnitude, 
Mt, in a magnitude-limited catalogue is given by 

N{Mt) = (^(Mt)[V^max(AfO - V^UMt)]. (113) 

This equation is consistent within the usual Schmidt framework, where the LF would now 
be constructed by a l/[Fmax(Mt) — V^i-a{Mt)] weighting scheme. However, we are now 
required to consider the number of galaxies Ne with an estimated absolute magnitude Me 
which is shown to be given by 

Ne{Me) = j dMt<t>{Mt) j ddL^^^^^^^^ Xp{Mt-Me\dL,Mt), (114) 

di.(Mr'°) 



^ The subscript t refers throughout this section as the true measured value, which, in this case, would be 
derived from a spectroscopic redshift 
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where is the luminosity distance and Vcom is the comoving volume. In order to simplify 
the process it is assumed that p( Aft — Me [d^, Mt) does not depend on and so the number 
of estimated absolute magnitudes can be reduced to 



Ne{Me) = I dM0(Aft)V(l/max, \4r 



^t), (115) 
where, 

V{Vra.^,V^in,Mt)= | ddL^^^^^^^^p{Mt - Me\dL , Mt) . (116) 

An iterative deconvolution algorithm originally developed by Lucy (1974) is finally ap- 
plied to estimate the redshift distribution and thus the luminosity function. In Rossi & Sheth 
(2008) they apply this method via mock catalogues to probe their effectiveness in areas 
such as the size-luminosity relation which is often distance-dependent (see also Rossi et al. 
2010). 



7.3 Modifying the maximum likelihood estimator 

Using Monte Carlos simulations Chen et al. (2003) demonstrated that applying the standard 
STY79 MLE to photo-z data produces an intrinsic bias into the shape of the LF. This mani- 
fests as a steeper slope at the bright end of the LF caused by intrinsic fainter galaxies being 
scattered more into the brighter end than brighter galaxies being scattered into the faint end 
of the LF. To attempt to reduce this bias they convolve for each photometric redshift of 
galaxy i, the Gaussian kernel with the redshift likelihood function to give 



:) 



1 



zCl{z' - 2i;zi)exp 



2ai 





Thus, they show that the total likelihood function is then represented by 



dz . 



(117) 



W j i^iPiizi - z'; Zi)dz ^ J J ^iPii^i - Zi)dzdm 



m,„i„ 



(118) 



where is the fraction of the angular area sensitive enough to detect the galaxy rrii at a 
redshift zj and. 



10 



OA\M,-Mi{mi,z')\(l+a) 



exp -10 



-)OA\M,-Mi(mi,z 



(119) 



Applying this modification to the same simulated data showed an improvement in the re- 
covered input LF with a reduction in the systematic uncertainties of ~ 0.7 mag. 



Following from this, and in the same paper as his deconvolution technique for Vmax, Sheth 
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(2007) improves upon the work of Chen et al. by deriving a new likehhood. To begin with 
the general likelihood is expressed as 

^w-n^iSf. 020, 

i 

where S is the selection function, a represents the parameters that describe the shape of the 
LF and (j}{Li\zi, a) is the LF at a redshift z. 

If we now denote the estimated photometric redshift as Ze and thus the estimated ap- 
parent luminosity as le- Also, if k and zt are the respective true apparent luminosity and 
redshift then the number A'^ of estimated redshifts within a flux-limited catalogue is shown 
to be, 

Nize,ai) = I dz^^^^ I dlt47vdl{zt)x(l,{Lt\a.)p{ze\zt,lt), (121) 



where, Vcom is the comoving volume. The absolute luminosity, Lt = kiivdj^^z), where 
dL{z) is the luminosity distance. The joint distribution of Ze and le is then given by 



leN{le,Ze,a) = j dz '^^^°_^ ATld\{zt) X 4){Lt\s)p{ze\zt,lt)- 



(122) 



Thus the probability distribution by which the likelihood is to be maximised is such that 

N 

- I\ 

N{zt^e,&) 



1=1 



The modification by Christlein et al. (2009) attempts to define a likelihood function simi- 
lar in approach to that of M83 (as described in Equation 43) which, as suggested by the 
authors, would replace the photometric redshift of Sheth by a more robust observable. Just 
as M83 gridded the luminosity-redshift information, this method defines a parameter space 
that encompasses absolute magnitude Mq, SED and redshift zq of a galaxy. This so-called 
photometric space is treated as an n-dimensional flux space where each dimension corre- 
sponds to one filter band in which a flux, /, is measured. This space is gridded into cells 
such that each cell either contains or 1 galaxies. Equation 43 can then be written as 

N 

C = Ylx(me-'(f^^''fYle~'(f^^^f, (124) 

i j 

where \{fi)df is the expectation of the number of galaxies in the i*^ cell in photometric 
space. To map between the LF parameters and photometric space it is assumed that the pho- 
tometric data are normally distributed centred on the expectation value for a given absolute 
magnitude Mq, SED and redshift zq. Thus \(fi) is defined as 

m) = Y.J dMoJ dzo (^) pM^\Mo;SED;zo) ^^^j^.sEB;z,\P), (125) 
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where P represents the LF parameters, dVc/dz is the differential comoving volume and pa 
represents the fraction of galaxies that contribute to the galaxy density at a point fi. The 
standard Schechter LF of the form in Equation 9 was adopted to which an evolutionary term 
was added. To constrain a galaxy to a given z, SED and M the above equation is modified 
as 



where zt, SEDt and Mt are the true values to which a given galaxy is constrained. The 
standard Schechter LF of the form in Equation 9 was adopted which an evolutionary term 
added. 




X <5(SED - SEDt) X S{M - Mt) x (Mq; SED; zo\P), 



(126) 



Table 1: Summary of the traditional LF estimators. 



Estimator Type Developed by Section Pros Cons 



Maximum Likelihood (MLE) Parametric Sandage et al. (1979) § 4, Page 17 No binning required < c - 
No built-in normalisation 



Extensions to MLE: 



Provide normalisation 


Davis &Huchra (1982) 


§4.1, Page 18 


C modelled by Poisson probabilities 


Marshall et al. (1983) [M83] 


§4.2, Page 19 


Provides error estimates 


Efstathiou et al. (1988) 


§4.1, Page 18 


Combining multiple data-sets 


Heyl etal. (1997) 


§4.2, Page 19 


Galaxy clusters 


Andreon et al. (2005) 


§4.2, Page 19 


Bivariate LF: 






Galaxy radius 


Choloniewski (1985) 


§6.1, Page 35 


Colour (utilises l/Vmax) 


Chapman et al. (2003) 


§ 6.2, Page 35 


Photo z: 






Gaussian kernel 


Chen et al. (2003) 


§ 7.3, Page 39 


Recast as conditional probability 


Sheth (2007) 


§ 7.3, Page 39 


Adopts M83 method 


Christlein et al. (2009) 


§ 7.3, Page 39 



Classical Method ^ N jV) 



Non-Parametric 



{Christensen(1975) 
Schechter(1976) 
Felten(1977) 



i 5.1, Page 21 



No assumption of form of LF 



J Biased by clustering 

1 Returns bias LF estimate 



Extensions to Vmax^ 
Combining multiple data-sets 
Diameter limited surveys 
Reduce clustering and shot noise bias 



Non-Parametric Schmidt (1968) § 5.2, Page 21 



Avni & Bahcall (1980) § 5.2.1 

Hudson & Lynden-Bell (1991) § 5.2.1 

Eales (1993) § 5.2.1 



r No assumption of form of LF 
Easy to apply 
Many improvements made 
Provides test of evolution 

[provides test of completeness 



Biased by clustering 



Continued on next page 



Table 1 - continued from previous page 



Estimator 



Type 



Developed by 



Section 



Pros 



Cons 



^T'/'tt-max & o/omax Variants 
Reduce bias toward flux limit 
Fit model to reduce bin bias 
Tracing LRG evolution across z-slices 

Bivariate LF: 

Colour (utilises MLE) 

Photo z: 

Deconvolution technique 



Qin&Xie(1997, 1999) 
Page & Can-era (2000) 
Miyajiet al. (2001) 
Tojeiro & Percival (2010) 

Chapman et al. (2003) 

Sheth (2007) 



§5.2.1 
§5.2.1 
§5.2.1 
§5.2.1 

] 6.2, Page 35 

] 7.2, Page 38 



The C~ method 

Extensions to the C ~ method: 

Combining multiple data-sets 

& derives eiTor estimates 
Simplified & includes normalisation 
Smoothing kernel 

Combines methods of C87 & CP93 
variant 
Photo z: 

Gaussian enor distribution 



Non-Parametric Lynden-Bell (1971) § 5.3, Page 29 



Jackson (1974) § 5.3.1, Page 31 

Choloniewski (1987) tC87] § 5.3.1, Page 31 

Caditz & Petrosian (1993) tCP93] § 5.3.1, Page 31 

Takeuchi et al. (2000) § 8.3, Page 48 

Zuccaetal. (1997) § 5.3.1, Page 31 

Subbarao et al. (1996) § 5.3.1, Page 31 



No parametrisation of the LF required 

recovers the CLF 

hidependent of clustering effects 



Provides normalisation 



No built-in normalisation 



^, , , ^ , , 1 Turner (1979), r. r- a I Biased estimator 
The (p method Non-Parametnc < §5.4, Page 32 Independent ot clusterina effects < 
I Kirshner et al. (1979) " 1 CoiTclated eiTors 

Extensions to 0/*: 

Adopting parametric form of the LF e.g. de Lapparent et al. (1989) § 5.4, Page 32 



Continued on next page 



Table 1 - continued from previous page 



Estimator 



Type 



Developed by 



Section 



Pros 



Cons 



The Step- Wise Max Likelihood 

Extensions to the SWML: 

Combining multiple data-sets 
Model S(z) as piecewise power laws 
Inteipolating through steps 

Bivariate LF: 

Galaxy diameters 

Surface brightness 



Non-Parametric 



INicoll& Segal (1983), 
Choloniewski(1986) 
Efstathiou et al. (1988) 

Heyl et al. (1997) 
Springel& White (1998) 
Koranyi & Strauss (1997) 

Sodre&Lahav(1993) 
Driver et al. (2005) 



i 5.5, Page 33 



] 5.5, Page 33 
\ 5.5, Page 33 
] 5.5, Page 33 

j 6.3, Page 36 
I 6.4, Page 37 



No assumption of form of LF 
robust estimator 

Independent of clustering effects 



Reduce selection f"^ bias 
Reduce selection f^ bias 



No built-in normalisation 
selection f^ bias 
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8 Method comparison review 

Thus far all the traditional LF estimators have been summarised along with their most rele- 
vant modifications and extensions. This aim of this section is to highlight some of the key re- 
sults from papers by Heyl et al. (1997) (H97), Willmer (1997) (W97), Takeuchi, Yoshikawa, & Ishii 
(2000) (TYIOO) and Ilbert et al. (2004) (104), which have provided detailed comparisons 
and, therefore, useful insights of selected LF estimators. Such comparisons are vital and en- 
able us to gain a practical sense of the limitations and any intrinsic bias of these techniques 
in a more controlled environment. 

8.1 Heyletal. (1997) 

As already discussed in the previous section, H97 extended the EEP88 SWML (which was 
referred to as the SSWML to make the distinction) which could account for multiple sam- 
ples with varying magnitude limits. The paper was also concerned with exploring the role 
of evolution in galaxy surveys. This discussion is focussed on the first half of this paper 
where they compare the SSWML with the l/Vmax estimator using simulated data. The 
Monte Carlo (MC) samples drew magnitudes from a Schechter function adopting a standard 
cosmology. For simplicity, Ti'-corrections were not considered. With this basic set up, the 
following scenarios were explored. 

8.1.1 Case 1 - Density evolution: 

In the first scenario they constructed a sample of 1800 galaxies, binned equally over 6 ap- 
parent magnitude ranges with an overall magnitude range between 11.0 < m < 24.0 mag. 
To give a simplistic model of density evolution they doubled the density of galaxies beyond 
z = 0.2. 

Their findings in this case showed both estimators were very consistent with each other, 
with the exception of the SSWML returning smoother results. It was concluded that neither 
indicated any intrinsic bias with respect to the input LFs. 

8.1.2 Case II - Clustering: 

Potential intrinsic biasing due to clustering (particularly in the 1 /Vmax method) was then 
explored using three MC samples with over densities built in at jz = 0.05 and with each 
sample having clustered fractions of 45%, 65% and 85%. By exploring various bin width 
sizes they found that for very narrow bin widths both estimators were biased at the faint-end 
slope of the LF, over predicting the density. However, as the bin widths increased 1/Vmax 
remained unchanged but the SSWML method showed more robustness to the clustering 
showing a systematic decrease in bias. For the three catalogues both estimators did show an 
increase in steepness of the LF with the SSWML being less affected (see Figure 8). It was 
therefore concluded that the SSWML was a superior estimator to that of Knax for surveys 
with strong clustering. 

8.2 Willmer (1997) 

In this paper, the author compares LF parameter values derived from the Choloniewski 
(1987) version of Lynden-Bell's C~ method, the MLE of STY79, the Turner (1979) ct>/$ 
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Fig. 8 Plots courtesy of Heyl et al. (1997). Selected results from H97 (Case II in our text) where increasing 
levels of clustering in the MC samples are shown. The top-left panel has no clustering, the remaining panels 
are the result of 45% (top right), 65% (bottom left) and 85% (bottom right) clustering. 



method, the SWML estimators of Choloniewski (1986) and EEP88, and finally Schmidt's 
1/Knax estimator. All were applied in two different cases: the first using MC simulations 
(based on the CfA survey) and the second to the CfA-1 (Huchra et al. 1983) survey data. 

8.2.1 Case 1 - Monte Carlo (MC) simulations 

In this case a total of 1000 MCs were drawn from a Schechter function (Schechter 1976) 
using a homogeneous redshift distribution and inputting three different values of a (i.e. 
oiin = —0.7,-1.1 and -1.5 ) in order to probe sensitivity of the LF estimators to the in- 
clination of the faint-end slope. Finally, the MCs were designed to have a similar survey 
geometry as the CfA- 1 redshift survey. 

It was demonstrated that all of the estimators under comparison recovered the values 
extremely well with the exception of 1/Vmax (see Table 2). Overall the STY79 and C~ 
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Table 2. Extract from Willmer (1997) reproduced by permission of the AAS. Median 
values for recovered parameters M* and a, derived from 1000 CfA-1 like Monte Carlo 



simulations. 


M* 


a 


M* 


a 


M* 


Input Values 


-1.50 


-19.10 


-0.70 


-19.10 


SWML 


-L45 ± 0.14 


-19.13 ±0.11 


-0.82 ±0.14 


-19.19 ±0.12 


STY 


-1.43 ± 0.06 


-19.06 ±0.07 


-0.69 ± 0.08 


-19.10 ±0.06 


Choloniewski 


-1.50 ±0.08 


-19.00 ±0.11 


-0.87 ±0.15 


-19.08 ±0.12 


Turner 


-1.51 ± 0.09 


-19.16 ±0.10 


-0.66 ±0.11 


-19.11 ±0.09 




-1.67 ± 0.05 


-19.14 ±0.09 


-0.94 ± 0.08 


-19.14 ±0.07 


1/VmaxziM 


-1.50 ±0.06 


-18.98 ±0.07 


-0.83 ±0.11 


-19.04 ±0.09 




-1.51 ± 0.07 


-19.11 ±0.08 


-0.72 ±0.12 


-19.10 ±0.07 



Table 3. Extract from Willmer (1997) reproduced by permission of the AAS. Schechter 
function parameters for CfAl survey, using same limits for all methods 



Method 


a 


M* 




Notes 


SWML 


-1.20 ±0.03 


-19.30 ±0.04 


AM 


= 0.25 


STY 


-1.11 ±0.08 


-19.17 ±0.08 






Choloniewski 


-1.18 ±0.05 


-19.26 ± 0.07 


AM 


= 0.25 


Turner 


-1.11 ±0.06 


-19.32 ± 0.05 


Az = 


500 kms-l 




-1.59 ±0.04 


-19.43 ± 0.07 


AM 


= 0.25 




-1.70 ±0.05 


-19.55 ±0.05 


Az = 


500 kms-l 


c- 


-1.20 ±0.01 


-19.21 ±0.01 







methods were shown to be the most robust providing the best results. They show that despite 
having a homogeneous sample, 1 /Vmax indicated bias, giving higher values for the faint-end 
slope compared to the other estimators under test. However, as we shall see in the following 
section, TYIOO applied the modified Bales (1993) Vmax variant to a homogeneous sample 
and demonstrated its consistency with other estimators. In Willmer et al. (2006) they apply 
this variant to the Deep Extragalactic Evolutionary Probe 2 (DEEP2) survey data to avoid 
potential bias. 

8.2.2 Case II - The CfA-1 redshift survey 

The estimators were then tested against the actual survey data from CfA-1 (see Table 3). 
The results indicated that all the estimators gave consistent results with a = —1.2 and 
M* = —19.2 mag with the exception once again of the Vmax • In this case they noted that 
the density measured was lower than STY79 by a factor of 2. 

Overall, it was found that he STY79 and the methods gave the best most robust 
results. However, they did note that STY79 fit underestimated the faint-end slope whereas 
the bias observed in l/Vmax showed overestimating the faint-end slope. Therefore, for sce- 
narios where samples have a steeper slope, the C~ method would be a preferred choice of 
estimator. 
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8.3 Takeuchi, Yoshikawa, & Ishii (2000) 

Undoubtedly this article provides the most rigorous comparisons to date. Four estimators 
were put under test. This first was the Bales (1993) variation of the l/Vmax estimator 
which traces evolution with redshift. The second method was a variation on the C~ es- 
timator which they refer to as the Lynden-Bell-Choloniewski-Catditz-Petrosian (LCCP) 
method, which incorporates the Choloniewski (1987) extension and the smoothing kernel 
introduced by Caditz & Petrosian (1993) (see Equation 79). The third and forth methods 
were the Choloniewski (1987) and the EEP88 respective variations of the SWML estima- 
tors. They consider the following three distinct scenarios. 

8.3.1 Case I - Simulated luminosity functions 

The four LF estimators were applied to three different types of mock catalogue using the 
following LFs: 

- Uniform distribution. 

- Power-law that increases towards faint magnitudes, 

- Power-law that decreases towards fainter magnitudes, 

- Gaussian distribution, 

- Schechter function with a = —1.1, implying a flat faint-end slope, 

- Schechter function with a = —1.6, implying a steep faint-end slope. 

The MC simulations were created with the following spatial density features: 

- A homogeneous distribution. This was largely to explore any bias that may have been 
inherent in the 1/Knax as originally claimed by W97. The sample was created neglect- 
ing the J\ -correction, with a redshift limit of z = 0.1 and an apparent magnitude limit 
"T-iim = 13.0 mag. 

- Two inhomogeneous samples. In the first, half the galaxies in the sample lie within a 
dense cluster at a distance of 0.8 Mpc with a radius of 0.8 Mpc. The second mocks an 
overall under-density with a spherical void with a radius of 1.6 Mpc at a distance of 0.8 
Mpc. Both scenarios have the same overall density as the homogeneous sample. 

An MC catalogue was then generated with sample sizes of 100 and 1000 to observe the 
effects of Poisson noise when one is sampling from small data-sets. 

The results from this part of the study revealed that in the homogeneous case all estima- 
tors gave results consistent with each other, thus contradicting earlier claims by W97, where 
they found a bias in their results when applying the original Schmidt 1/Knax. 

In contrast, the clustered MC sample showed that 1 /Knax was heavily biased produc- 
ing an overestimation in the recovered LF - as would be expected. Moreover, the results 
showed the other three estimators were not adversely affected by this inhomogeneity and 
demonstrated good agreement with all the input LFs. However, larger error bars were ob- 
served when applying the Choloniewski method which implied shot noise was beginning to 
dominate for the smaller of the sample sizes. An excerpt of these results in Figure 9. 

8.3.2 Ca.se II - 2dFGRS MC catalogue 

In this case they apply three of the four LF estimators to a MC 2dFGRS sample prepared by 
Cole et al. (1998) - it is unclear why the LCCP method was excluded from this part of the 
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Fig. 9 Extract from Takeuchi, Yoshikawa, & Ishii (2000) reproduced by permission of the AAS. An Example 
from TYIOO (Case I) of the effects from applying the estimators on a homogenous MC sample (left) and a 
clustered MC sample (right). On the left hand plots we observe all four estimators are consistent with each 
other with no indication of intrinsic bias. However, the clustered sample on the right-hand panel shows how 
the expected bias of l/Vmax affects determination of the faint end. From this panel, there is no evidence of 
clustering intrinsically biasing the remaining estimators. The input luminosity function in both scenarios for 
this example was a uniform distribution. 



analysis. In general they found that all estimators gave consistent results compared with the 
input LF (see the top panels in Figure 10). They did, however, report slight deviations in the 
l/Knax estimator which were considered to be due to the clustering in the sample. 

8.3.3 Case III - Hubble Deep Field (HDF) 

All four estimators were applied to the photometric redshift HDF (Femandez-Soto et al. 
1999; Williams et al. 1996) to probe evolution of the LF shape. The overall sample size 
under test consisted of 946 galaxies. Whilst they provide extensive details of the intrinsic 
evolutionary properties found in this study, overall they reported that all four 4 estimators 
gave consistent results with a distinct evolutionary trend in the LF with redshift. An example 
of these results is shown in the bottom panels of Figure 10. It has been noted that rudimen- 
tary determination of A"-corrections adopted by TYIOO in this analysis may have adversely 
affected the overall accuracy of LF determination for all of the estimators applied in the 
HDF results (Takeuchi, private communication). However, as we shall see in the following 
section, Ilbert et al. (2004) provided a rigorous study of the effect of K-correction bias on 
LF estimators. 

Finally, an interesting aside in this analysis compares the algorithms for efficiency. They 
found that the Choloniewski method was the fastest, whereas the EEP88 method required 
more iterations and Vmax required the maximum volume to be calculated for each galaxy 
which both contributed to a slower computational time. 
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Fig. 10 Extracts from Takeuchi, Yoshikawa, & Ishii (2000) reproduced by permission of the AAS. Top 
panels: results from Case II where 2dFGRS MC samples were tested. They showed that considering the 
samples in redshift space (left) and real space (right) had little adverse effect on determining the LF. Both 
panels show consistency between the estimators under test compared to the input LF. Bottom panels: example 
of results from Case III where the estimators were applied the HDF data. In the context of this review it can 
be seen that in the redshift ranges considered in both panels, once again, all estimators seem to show overall 
consistency with each other 



8.4 Ilbert et al. (2004) 

In contrast to the previous studies examined in this section, 104 explored a specific intrinsic 
bias inherent in the estimation of the global LF. The bias, which is more sensitive toward 
the faint end of the LF, arises when the wavelength in the selection filter differs significantly 
enough from the wavelength of the reference filter. For a given data-set comprising different 
galaxy types having different SEDs will consequently lead to different absolute magnitude 
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limits within which a given population could be observed at a given redshift for a fixed 
apparent magnitude limit. Figure 1 1 illustrates such a scenario. 

A relationship can be established that relates the redshift z, the effective wavelength 
of the selection filter, As, and the effective wavelength of the reference filter, \Ref which 
allows us to probe the absolute magnitude limit, A/j^j^"*, for a given reference filter, as a 
function of redshift for different galaxy types. In this example the galaxy types are irregu- 
lars, spirals and ellipticals which can be generally referred to ranging, respectively, from the 
'blue' populations to 'red' populations. The three panels in Figure 1 1 represent three differ- 
ent cases exploring the relationships shown in Equation 127 within a given redshift range, 
^loio < -2^ < Zhigh, corresponding to 0.7 < z < 1.25. 

1 + Zlow { ~ Ki/XRef (127) 
> Xs/\Ref 

Each case comsponds to observing the three galaxy populations within different reference 
filters which can be summarised as follows. 

1- l + ^;ou) < ^Iab /^VVhst- 

In the top panel of Figure 1 1 Mj^™* is defined for the UV Ref filter and is conse- 
quently brighter for blue galaxies than for red. Thus, in the top panel of the figure the 
faint irregular SEDs becoming increasingly unobservable within the absolute magnitude 
range where the other spiral and elliptical galaxy types can still be detected. In this fil- 
ter within the redshift range considered, both irregular- and spiral-type galaxies are not 
observable beyond absolute magnitudes fainter than A/j*™* = —14.2. However, the el- 
liptical galaxies remain observable out to A^^^"* = -10.0. Therefore, one would expect 
the global estimation of the LF would become biased between -10.0 < Afi*-™* < 14.2. 

In the middle panel of Figure 1 1 Afj^™' is now defined for the B We/filter and is approx- 
imately the same for all galaxies for the given B-band reference filter within the redshift 
range considered. In this case one would expect minimal bias effect as most galaxy pop- 
ulations would remain observable out to the faintest absolute magnitude limit. 

3- 1 + 2;oiu > ^Iab I -^iiisT • 

Finally, observing galaxies in the I Ref filter as show in the bottom panel of Figure 1 1 
implies that Af/-™* is now brighter for red galaxies than for blue and therefore faint red 
galaxies are missing from the sample. Opposite to Case 1, elliptical galaxies are now 
unobservable within the absolute magnitude range where irregulars would still be de- 
tected. However, as can be seen, this bias should be less prevalent than in Case 1 since 
the maximum absolute magnitude range this effects is only in order of ~ 1 mag. Thus, 
where all SEDs have the same absolute magnitude limit, no bias should be present in 
the recovered LF. 
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Redshift 



Fig. 1 1 Extracts from Ilbert et al. (2004) reproduced by permission. To illustrate how the bias creeps into 
the global LF consider the three cases in the above panels. In the top panel a galaxy distribution is shown 
consisting of irregulars (crosses), spirals (solid triangles) and ellipticals (open circles). For each galaxy type 
SED. the corresponding faint observable absolute magnitude limit Afj'^^"' is shown as a function of redshift. 
The vertical dashed lines represent the redshift region 0.7 < z < 1.25 within which the bias is studied. The 
section filter is / with Iab < 26 mag and the reference filters are respectively shown from top to bottom as 
UV FOCA (2000 A), B HST (4500 A), I HST (8140 A). 



To explore the impact of this bias, 104 apply the l/Vmax, C+, STY79 and the EEP88 es- 
timators to both real and simulated data across these reference filters. For this review only 
results pertaining to the UV and B filters are summarised which are, respectively, shown 
as the top and bottom panel sets in Figure 12. The simulated mocks were drawn from the 
multi-colour mock catalogues that are based on an empirical approach using observed LFs 
to derive redshift and apparent magnitude distributions [see 104 and Arnouts et al. (1999) 
for more details]. In essence, they generate two sets of mock catalogues which have been 
classified into three main spectral classes: irregulars, spirals and ellipticals. In the first set 
of mocks (corresponding to the left-hand panels in each set of Figure 1 2) they use only one 
SED per spectral class. This is to compare against the second set of more realistic mocks 
where objects have been drawn from an interpolated set of 72 SEDs (the middle panels in 
each set of Figure 12). In this case one spectral class can correspond to multiple SEDs and 
therefore the bias may be evident within a single spectral class. 

For the real data sample, they used the photometric catalogue and photometric redshifts 
from the Hubble Deep Field (HDF) North and South surveys (Arnouts et al. 1999, 2002) 
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MOCK (3 SEDs) MOCK (72 SEDs) HDF-N + S (72 SEDs) 




M,b(UV) 



MOCK (3 SEDs) MOCK (72 SEDs) HDF-N + S (72 SEDs) 




M.b(B) 

Fig. 12 Extracts from Ilbert et al. (2004) reproduced by permission. Examples of how this form of obser- 
vational bias affects the recovery of the global LF for different estimators in two different reference filters 
-UV (top panel set) and B-HST [4500A] (bottom panel set). For each panel set the results for the simulated 
mock data are shown in the left and middle plots. The LFs from HDF-North and -South are shown in the 
right-hand plots of each set. The resulting LF estimates for each estimator are indicated on each panel set. 
The LFs corresponding to the three input SEDs used are shown as dotted lines on the left-hand plots of each 
panel set: from the steepest to the shallowest slope, iiTegulars, spirals and ellipticals LF, respectively. The 
global simulated LF (i.e. the sum of the three input LFs) is shown as a solid line. 
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(the right-hand panels in each set of Figure 12). For consistency, they have used the exactly 
the same set of 72 SEDs as in the mock sample. 

Before discussing their findings, the remainder of the setup of Figure 12 is discussed 
first. The upper and lower plots in each panel set are the recovered LFs in the respective 
redshift ranges 0.70 < z < 1.25 and 1.25 < z < 2.00. The first of these ranges corresponds 
to the range shown in Figure 1 1 . The left-hand plots show the input LFs for each spectral 
type in the mock samples, indicated by the dotted lines. From the steepest to the shallowest 
slope correspond to irregulars, spirals and ellipticals. In each plot the global input LF is 
shown as a solid line. Finally each LF estimator is indicated by a: dashed line (STY79), 
triangles (SWML EEP88), squares (C"*" -method) and circles (l/Vmax). In each case the 
global LF is determined. 

8.4.1 Case 1 analysis - the UV reference filter: 

Due to the nature of how each LF estimator is constructed, the resulting bias affects them 
in different ways. Examining firstly the top panel set corresponding to the reference filter, 
UV, we can see clearly that all LF estimators underestimate the global input LF. In the mock 
data for the first redshift range (top panels), the C^-method and 1 /Vmax methods follow the 
brighter end of the LF well but then show a sharp drop beyond M ~ —14, at which point 
they seem to recover the input LF of the elliptical samples. A similar trend is observed for the 
more distant redshift range. This bias seems entirely consistent with Case 1 described above 
and shown in the top panel of Figure 11. Whilst the SWML also shows a drop off at this 
magnitude it is less biased, recovering more the shape of the input elliptical LF as opposed 
to the LF itself. Contrastingly, the parametric STY79 method shows only a marginal short- 
fall in the recovered LF at faint magnitudes, and, moreover, in the left-hand bottom panel it 
seems to recover the input global LF exactly. 

Whilst the HDF data in the right panels do show a drop off for all estimators toward the 
faint end of the LF, the impact is significantly less than in the mock data. It was thought that 
this was due to the very small sample size of the data and thus it was concluded that in this 
reference frame the global LF is not wholly recoverable. 

8.4.2 Case 2 analysis - the B reference filter: 

As already discussed and shown in the bottom panel of Figure 1 1, in this reference frame 
the absolute magnitude limits for the spectral types are very similar across the redshift range 
indicated. Thus in the top plots of the middle panel set in Figure 12 it is clear that all LF 
estimators recover the global LF extremely well. In the bottom plots of this set we can ob- 
serve a slight overestimation in the SWML and STY79 estimators toward the faint end. This 
is consistent with the middle panel of Figure 1 1 where at this redshift range, the absolute 
magnitude limits for each spectral type begin to show stronger divergence. 

The HDF LF is fairly recovered in the filter with only slight underestimation in STY79 
at 0.70 < z < 1.25 and slight overestimation at 1.25 < z < 2.00. 



This type of analysis into the robustness of the traditional LF estimators provides a useful 
tool for probing the deep survey sample where samples may be more prone to biasing from 
variable magnitude limits resulting from large A'-corrections across spectral types. The au- 
thors conclude that since the STY79 and SWML methods differ in their biasing (beyond 
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Poisson errors) compared to the 1/Knax and estimators can be a good indicator that 
either one has bias in recovering the global LF. 

Several possible ways to reduce this bias entering into LF estimation are offered. These 
include selecting galaxies subsamples in the closest rest-frame filter to the reference filter 
as performed by e.g. Poli et al. (2003). However, this requires multi-colour information to 
be able to derive the same rest-frame band LF at different redshifts. Another possible route 
applies more to large survey data. This would require estimating the global LF within the 
absolute magnitude range in which all galaxy types are detected. Of course, this approach 
would require the cutting of a percentage of the data. 



9 Emerging Generalised Methods 

Despite the continuing popularity of such methods as 1/Vmax, MLE and the SWML, there 
has been renewed exploration into more innovative statistical approaches. These new meth- 
ods are largely motivated by the potential hazards and pitfalls inherent with the current 
traditional approaches. I now examine in more detail three such approaches to estimating 
the LF that have emerged over the last few years. The first is a semi-parametric approach by 
Schafer (2007), the second is a Bayesian approach by Kelly et al. (2008), and the third, by 
Takeuchi (2010), applies the copula to construct the bivariate LF. 



9.1 Schafer (2007) - A semi-parametric approach 

This approach is statistically rigorous and considers data-sets that are truncated i.e. flux 
limited. There can be potential advantages for this method summarised as follows. 

1. No strict parametric form is assumed for the bivariate density. 

2. No assumption of independence between redshift and absolute magnitude is made. 

3. No binning of data is required. 

4. A varying selection function can be incorporated. 

By not assuming separability Schafer decomposes the bivariate density 4>{z, M) into, 

\og^{z,M,e) = f{z) + g{M) + h{z,M,6), (128) 

where h(z, M, 9) has a parametric form that incorporates the dependency between red- 
shift, z, absolute magnitude, M, and the real valued parameters, 9, by folding in, for ex- 
ample, evolutionary models. The functions t{z) and g{M) are, however, determined non- 
parametrically. He then incorporates an extended form of the maximum likelihood approach 
called the 'local' likelihood estimator for the density estimation and applies this to 15,057 
quasars from Richards et al. (2006). This semi-parametric approach has the advantage of 
allowing the user to estimate evolution of the LF with redshift without assuming a strict 
parametric form for the bivariate density. The only parametric form required is that which 
models the dependence between redshift and absolute magnitude. Moreover, it should be 
noted that this method assumes a complete data-set. 
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9.1.1 Estimating the local likelihood density: 

This approach is a non-parametric extension of the MLE where one assumes the data X = 
{Xi , X2, Xn) are observations of independent, identically distributed random variables 
from a distribution with density /q. The MLE (Jmle) for /o is defined as the / G iP, where 
T denotes the class of candidates for /o, and is maximised as 



Y.^ogf{X,) 



(129) 



From this, one can localise the likelihood criterion and thus construct the final local likeli- 
hood fi^i^ estimator by smoothing the local estimates giving 



K*(x, u, X)fu{x) 

.ties 



y^ K* {x, u, A) 

.u£g 



(130) 



where Q forms a grid u G CJ of equally spaced values (between -3 and 3 in the authors 
example) of a Gaussian density with mean zero and variance of unity. The term, K* (x, u, A), 
is therefore a kernel function such that 



Yk*{x,u,X) = 1 \fx. 



(131) 



By making Q sufficiently large, the amount of smoothing is completely dominated by the 
kernel function parameter, A. 



9.1.2 Extending to flux-limited data 

The local density likelihood is incorporated into the case for flux-limited survey data where 
one can include the dependence between the redshift, z, and absolute magnitude, M. A first 
order approximation of h is made from Equation 128 such that 

h{z,M,e) =ezM. (132) 

After an extensive derivation, a global criterion for the likelihood is found to be given by 

'C*{f,g,z,M,e) = Ywj I Y K*{zj,u,X)au{zj) 

j=l \u£G 

+ K*{Mj,v,X)MMj) + h{zj,Mj,e) 



ueQ 



exp{h{z,M,9)) 



J2 K*{M,v,X)exp{b^{M)) 



.uea 



Y K*{M,v,X)exp{a^{M)) 
.uea 



dM dz 



(133) 



where au(z) and bv(-M) are degree p polynomials which form part of the smoothing term 
of K* for local estimates. A defines the region outside of which the data are truncated on 
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Fig. 13 Extract from Schafer (2007) reproduced by permission of the AAS. Estimates of the luminosity func- 
tion at different redshifts (black solid curves and error bars), compared with estimates from Richards et al. 
(2006) (light grey solid curves and en'or bars). En'or bars represent one standard eiTor and account for statis- 
tical errors only. 



the (z, M) plane. The quantity wj is a weighting to take incompleteness into account and is 
defined as the inverse of the selection function. 

Estimating the LF in this way has the advantage of allowing the user to estimate the 
evolution of the LF without assuming a strict parametric form of the bivariate density. The 
method was applied to quasar data from Richards et al. (2006) and compared to their results 
which applied the Page & Carrera (2000) version of 1/l^max (see page 27). These results 
shown in Figure 13 are in good agreement with Richards. 
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9.2 Kelly et al. (2008) - A Bayesian approach 

In Andreon (2006) a Bayesian approach to constraining the LF is described by considering 
the color-magnitude distribution for both cluster and background galaxies (see also the end 
of § 4). This section now considers a second Bayesian technique to estimate LF developed 
by Kelly et al. (2008). In this paper they derive a likelihood function of the LF that relates 
observed data to the true LF (assuming some parametric form). They then use a Bayesian 
framework to estimate the LF and the posterior probability distribution of the LF parameters 
via a mixture of Gaussian functions. By modelling the LF using Gaussian functions, they 
circumvent the problem of having to assume a parametric form and allow for a flexible 
fitting method. 



9.2.1 Estimating the LF likelihood: 



The form of the likelihood function that they adopt for the LF estimation is derived from a 
binomial distribution. Whilst they highlight that the traditional approach of using a Poisson 
distribution is incorrect, they show that as long as the survey's detection probability is small, 
both approaches yield the same results. We recall the relation of the LF to the probability 
density of (L,z) can be written in the following separable form: 

p{L,z) = ^^{L,z)p[z), (134) 

where L is the luminosity, z is redshift and N is the normalisation set as the total number of 
objects in the observable Universe. From this starting point, the authors assume a parametric 
form for 4>{L,z), with parameters 6 and show that the observed data likelihood function is 
given by, 

p{Lobs,Zobs,^\e,N)^C?,[p{I = mf''" n P(i-^^l^)' (135) 

where 

TV 

p{L,z\e) = Y\p{L,,z,\e) (136) 



observed in a given survey. By adding in sample selection, the probability that the survey 
misses a source, given by the parameters 6, is 



p(/ = 0|6')= j j p{I = Q\L,z)p{L,z\e)dL dz (137) 

In Equation 135, 1 is a vector of size taking on values 

1 if ith source is included in survey 



otherwise 



Finally the term Cn = N\ / n\[N ~ n) \ is the binomial coefficient and Aobs is the set of n 
included sources. Equation 135 thus represents observed data likelihood function given an 
assumed LF that is parameterised by 6. 
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Fig. 14 From Kelly et al. (2008): Illustration of the prior distribution for the mixture Gaussian function 
(GF) approach to describing the LF with K = 5 GFs. For the two cases shown, each has the resulting 
marginal distributions of log z and log L on the respective top and right side of each plot. The left-hand panel 
illustrates unimodal prior assumed in the paper where the GFs are close together When the GFs are further 
apart (light-hand panel) the resulting LF is multimodal. 



9.2.2 The posterior probability distribution 

It is then shown that, for a given set of observed data, the joint posterior probabihty distri- 
bution of 6 and A*" for the LF is given by 

p{e,N\Lobs,Zobs) (xp{N\e,n)p{e\Lobs,Zobs), (138) 



9.2.3 Model for the LF 

To model the LF they adopt a similar approach as applied by Blanton et al. (2003b), where 
a mixture of Gaussian functions were used to accurately estimate the LF. In this case, to 
minimise the number of Gaussian functions (GFs) required to describe the LF, they consider 
log of the joint distribution of L and z. Equation 134 is now re-written as 



p{L,z) 



p{\ogL,\ogz) 
Lz(lnlO)2 ' 



(139) 



Moreover, unlike Blanton et al. (2003b), they allow the widths of GFs to vary and do not fix 
the centroids to like on a grid of values. This also adds to minimising the number of GFs 
required. The mixture of K Gaussian functions can be written as 



p{\ogLi,\ogZi\ix,^i,E) = ^ 



fe=i 
X exp 



(140) 



where 9 = (n,fj,, E), J2^^''^k = 1, x; = (log L^, log a^), /xj, is the 2-element mean position 
vector for the fcth Gaussian, is the 2x2 covariance matrix for the fcth Gaussian, and 
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denotes the transpose of x. The model for the LF is given by 




(141) 



K 



xE 



k=l 



Another crucial aspect to using the mixture model is the form of its prior distribution. Essen- 
tially, it is assumed, in general, the LF will be unimodal i.e. it does not show distinct peaks 
(see Figure 14). Consequently, only the prior distribution has a parametric form allowing 
the GF widths to be closer together and thus aiding the convergence of the Markov Chain 
Monte Carlo (MCMC). The form of this prior is given by 



where Cauchy2(/ifc|/i0 5 r) refers to a two-dimensional Cauchy distribution as a function 
of /ife, with mean vector /iq and scale matrix T. The Inv — Wisharti(£'fe|A) is the inverse 
Wishart density as a function of 17 j,, with 1 degree of freedom and scale matrix A. 
Finally, the total joint posterior distribution is shown to be given by 

p{e, iV, MO , ^1 log Lobs , log Zobs ) « p{N\en)p{e, , ^| log L^bs , log z^bs ) (143) 

MCMC is then applied using the Metropolis-Hastings algorithm (MHA) (Metropolis & Ulam 
1949; Metropolis et al. 1953; Hastings 1970) for obtaining random draws of the LF from the 
posterior distribution. Given a suitably large enough number of Gaussian functions it is flex- 
ible enough to give an accurate estimate of any smooth and continuous LF. They found that 
isT ~ 3 to 6 was generally a suitable number. 

This approach was applied to MC simulated quasars derived from a Schechter function 
and separate simulated SDSS-DR3 quasar data based on Richards et al. (2006). Figure 15 
shows the resulting LF at different redshifts for the latter sample. For the Schechter MC sam- 
ples they concluded that the mixture model proved to be flexible enough to approximate the 
true Schechter function form, but was not as accurate as if one had fitted the correct para- 
metric model. Moreover, they showed that one could just as easily substitute the mixture 
model for any parametric form of the LF into the likelihood function and posterior distribu- 
tion. The Bayesian mixture of Gaussian functions model is able to accurately constrain the 
LF, even below the survey detection limit. 

9.3 Takeuchi (2010) - Copula and correlation for the Bivariate LF 

Takeuchi adopts a technique for BLF construction that uses the copula function to con- 
nect two distribution functions and defines nonparametric measures of their dependence. To 
demonstrate this technique the FUV-FIR^^ BLF is constructed by adopting the Saunders et al. 
(1990) LF shown in Equation 13 for the IR, and a Schechter function shown in Equation 5 
for the UV. However, to summarise this technique some of the framework underpinning the 
copula definition is firstly introduced. 



K 



p{iT,ii,i:,fj.o,A) oc Yl P(MfclMo,'^)p(^fc|^) 
fc=i 

K 



(142) 



^ FUV-FIR refers to the far ultraviolet and the far infrared 
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log vL^ [erg s"'] 

Fig. 15 Extract from Kelly et al. (2008) reproduced by permission of the AAS. The input LF derived from a 
simulated sample is shown as a solid red line. The dashed line represents the posterior median estimate of the 
LF based on the mixture of Gaussian functions model. The shaded region is 90% of the posterior probability. 
The solid vertical line shows the flux limit of the sample at each redshift. 



9.3.1 Sklar's theorem 

Sklar (1959) created a new class of functions which he called copulae^. In the most general 
case, Sklar's theorem states that if xi, X2, xn variables of H, an n-dimensional distribu- 
tion function with marginal distribution functions (or 'marginals') Fi, F2, F„, then there 
exists an n-copula C such that 

H{xi,X2,...,Xn) = C[Fi{xi),F2{x2),...,Fn{xn)] (144) 

The above relation has the property that if the marginal distributions F are continuous then 
the resulting copula C is unique. However, if this is not the case then the copula is unique 
on the range of values of the marginal distributions on Range Fi x Range F2 x .... x Range 
Fn, where Range Fi is the range of the function Fi. 

The implementation of the copula in this work reduces the above to a 2-dimensional 
bivariate case where H is defined as 

H{xi,X2) = C[Fi{xi),F2{x2)]. (145) 

This theorem gives the basis that any bivariate distribution function with given marginals can 
be expressed in this form. Therefore, in essence, the copulae connect the joint distribution 
functions to their one-dimensional margins. 



Copula is Latin for "a link" or "tie" 
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9.3.2 Bivariate copula definition 

The copula itself is defined on the unit n-cube[0,l]" having uniformly distributed marginal 
distributions. For the 2-dimensional case, the copula is a function C : [0, 1]^ — >■ [0, 1] which 
has the following properties: 

1. C is grounded: that is, M u,v £ [0, 1], C{u, 0) = and C(0, v) = 

2. Cis2-increasing: thus, Vtii,ti2,wi,i'2 G [0, 1] such that tii <ii2and?;i < U2 , yielding, 

C{u2,v2) - C(m2,wi) - C(ui,«2) + C(mi,i;i) > 

3. 'iu,v£ [0,1], 

C{u,l)=u and C{l,u)=« (146) 

9.3.3 Boundaries of the copida 

Lastly, the Frechet-Hoeffding lower and upper bounds that are defined for every copula C 
and every (u,v)£ [0, 1]^ are such that 

max(?i + V — 1, 0) < C {u, v) < min{u, v) (147) 

9.3.4 Constructing the copula for the BLF 

There are a number of forms the copula can take and Takeuchi explores two of them: the 
Farlie-Gumbel-Morgenstem (FGM) copula and a Gaussian copula. For illustration of the 
method and simplicity only the Gaussian form is considered here. To measure the depen- 
dence between the two distribution functions Fi{xi) and ^2(3:2) the linear Pearson product- 
moment correlation function is applied. However, it is noted that to explore the non-linear 
dependencies, the Kendal r (Kendall 1938) or Spearman's ps (Spearman 1904) estimators 
would be preferred. 

The Gaussian copula C'~^ is defined as 

C^{ui,U2;p) = $2[$l\ui),$^\u2)\p\ (148) 

where $1 is the standard normal CDF (with mean zero and variance of unity) and $2 is the 
bivariate CDF with correlation p. By differentiating C'~^ w.r.t ui and 112 one obtains the C*^ 
density function shown to be given by 

c<=(tti,^2;p) = ^/=L^exp|-i (149) 

where <P^^ = <P~^(ti2)]"^ and / is the identity matrix. E is the covariance matrix 

given by 

Thus if we now denote the univariate LFs as (j>^^\Li) and 9!)2^-'(Z/2) then the bivariate PDF 
4>^'^\Li, L2) is described by a differential copula as^ 

0(2) (ii,L2) = c (Li),4') (L2)] <^1') {L2) (151) 

^ It should be noted that Equation 151 is the correct form for the expression of </>'^' (Li , L2). The equiv- 
alent expression given by Equation 32 in Takeuchi (2010) contained a typographical error. 
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Fig. 16 Extract from Takeuchi (2010): The analytical BLF constructed with the Gaussian copula with model 
LPs of UV- and IR-selected galaxies. The BLFs are normalised so that integrating over the whole ranges of 
LI and L2 gives 1. The linear correlation coefficient p varies from 0.0 - 0.9 from top left to bottom right. The 
contours are logarithmic with an interval A log (f)^'^^ =0.5 drawn from the peak probability. 



Thus, under the Gaussian copula the BLF is finally given by 

0(^)(Li, L2; p) = ^=L= exp |-i [w-'\e-' - I)^-'] I X 0(i)(Li)4^)(L2) 

(152) 

where 

#-i=[^.-i(0W(Li)),f-^(4'^(^2))]^ (153) 

Figure 16 shows a summary of results from the above application of the method using the 
Gaussian copula to construct the FUV-FIR BLF. The marginal distributions in this appli- 
cation are given as a standard Schechter LF (Equation 5 on page 3) for the FIR LF and 
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the log-Gaussian LF (Equation 13 on page 4) as defined by Saunders et al. (1990) for the 
FIR LF. It is demonstrated that if the linear correlation between two variables is strong the 
Gaussian copula is a useful way in which to construct this type of distribution function 
since it connects two marginal distributions and is directly related to the linear correlation 
coefficient between the two variables. 



10 From Tests of Independence to Completeness Estimators 

This review has had a thorough look at many of the non-parametric and parametric methods 
used to determine LFs. However, as discussed § 1 the fundamental assumption for the ma- 
jority of techniques is the separability of the density function and the LF that allows us to 
write the probability distribution of observables as 

${M,Z) = (j)[M)p{Z) (154) 

The quantity Z now refers to the distance modulus which is used instead of redshift through- 
out this section. The distance modulus is related to redshift and magnitude in the following 

way, 

Z = m-M = 51ogio[rfL(2, O^, Oa, Hq)] + 25 (155) 

This is a fundamental and crucial assumption that is generally accepted over small red- 
shift bins (or shallow redshift surveys). However, what is sometimes overlooked is whether 
the assumption of separability is valid. That is, after all corrections have been made for 
e.g. evolution, J\-correction etc., are the observables absolute magnitude A/ and distance 
modulus Z, of Equation 154 statistically independent? 



10. 1 Efron and Petrosian independence test 

In order to test this assumption of independence (or separability) between two variables, 
Efron & Petrosian (1992) developed a simple ranked-based non-parametric test statistic for 
data-sets with a single truncation. Their method is, in principle, an extension of the C~ 
method. For this technique Efron and Petrosian construct the Kendall (Kendall 1938) test 
statistic, r, based on the rejection of independence between the random variables M and 
Z under test. Although they give a very detailed and formal explanation, only the main 
points are summarised here. To illustrate this test refer back to Figure 7 on page 30, which 
schematically shows the construction of the C~ method. 

If, for the moment, we imagine the most simplest scenario where we are not hampered 
by an apparent magnitude limit ml^^, then M and Z would be independent assuming we 
have a complete sample. Consequently, the rank, Ri, of M,, would be uniformly distributed 
between 1 and the number of galaxies in the set, A'g^i, with respected expectation value and 
variance, 

£;=i(iV+l), F = -L(iv2_i). (156) 
The quantity T for each galaxy is then defined as 



V 



(157) 
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Fig. 17 Schematic illustrating the construction of the Efron and Petrosian (1992) test of independence. 
In order for this test to be applied correctly it is crucial to account for the presence of the faint apparent 
magnitude hmit, ^{^^ which introduces a correlation between M and Z. Thus, drawing from the basic ideas 
of the C'^ method, it is possible to construct a separable region, S, for each galaxy located at (A/^, Z^) within 
which one can now can construct the r statistic of independence. 




such that Ri is now normahsed to have mean of zero and a variance of unity. The hypoth- 
esis of independence is then rejected or accepted depending on the value of Tj . One then 
constructs confidence levels of rejection by combining Ti into the single statistic r such that 

=^ (158) 

i 

where, by definition, a r of 1 indicates a 1-cr correlation and conversely, a r of indicates 
that the variables are completely uncorrected. 

Now, if the magnitude limit mfj^^^ is re-introduced as in Figure 17, the above method has 
to be modified to correctly account for the magnitude limit breaking the separability between 
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M and Z. This requires tlie construction of subsets for each galaxy located at {Mi, Zi) to 
correctly calculate the ranks Ri of the entire set of observables. Therefore, within each set, 
all objects which could have been observed up to m[jj^ limit are counted such that for each 
galaxy an area S is defined: 

oo < Mi < Mii„ 
and 

< < Zmax(M,), 

as illustrated in Figure 17. Within this subset Ri is uniformly distributed between 1 and 
Ni{S), where Ni{S) is the number of objects in each subset. With this amendment to the 
method, the construction of the r statistic remains the same with r once again being defined 
as, 

=— (159) 

where the expectation value E and variance V remain unchanged from Equation 156. The 
statistic r thus gives us an unbiased measure of the correlation of the data-set whilst properly 
accounting for the apparent magnitude limit. 

In Efron & Petrosian (1999) they extend this method for data which have a double trun- 
cation i.e. a survey that has distinct faint and bright flux limits. The natural progression was 
to explore cases where the value of |r| > 1, which implies random variables M and Z cannot 
be considered separable. In this case they assume that there is some form of luminosity evo- 
lution in the underlying data. And so by adopting a parametric form for luminosity evolution 
that varies with an evolutionary parameter, usually referred to as /?, they were able to show 
that for a particular value of /3, r(/J) would equal 0. 

Maloney & Petrosian (1999) apply these methods on various quasar samples to deter- 
mine the density functions and the luminosity evolution. These techniques have also been 
used by Hao et al. (2005) for AGN's to test the correlation between the host galaxy and nu- 
clear luminosity. Kocevski & Liang (2006) applied this to constrain luminosity evolutionary 
models of Gamma Ray Bursters (GRBs). 



10.2 The Tc and Tv completeness estimators 

In Rauzy & Hendry (2000) the authors drew upon the ideas from Efron & Petrosian (1992) 
to develop a non-parametric method to fit peculiar velocity field models. This methodology 
was later extended in Rauzy (2001) as a completeness test. The motivation for this statistical 
method was to introduce a non-parametric test that could determine the true completeness 
limit in apparent magnitude of a magnitude-redshift survey whilst retaining as few model 
assumptions as possible and being independent of spatial clustering. 

The fundamental approach of the Rauzy completeness is the same as Efron and Petrosian 
i.e assumption of separability between the absolute magnitude, M, data and the distance 
modulus Z data. As demonstrated in the previous section, the presence of a faint apparent 
magnitude limit introduces a correlation between the variables M and Z for observable 
galaxies. Thus to retain the assumption of separability and construct a completeness statistic, 
the author defines a separable region defined by the random variable The left-hand panel 
of Figure 1 8 illustrates how the variable ( is constructed such that it satisfies 
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Fig. 18 Schematic construction the random variable, C,, for the Rauzy (2001) completeness test (left-hand 
panel) and the Johnston, Teodoro, & Hendry (2007) extension (right-hand panel). In the left-hand plot the 
constructed regions Si and 5*2 are defined for 'trial' faint apparent magnitude limit for a typical galaxy at 
[Mi, Zi). Also shown is the true faint apparent magnitude limit rriy^^, within which the rectangular regions 

51 and 52 contain a joint distribution of M and Z that is separable. The right-hand plot now shows the 
extension where one accounts for the presence of a secondary bright apparent magnitude hmit. The Si and 

52 regions are uniquely defined for a sHce of fixed width, &Z, in corrected distance modulus, and for 'trial' 
faint apparent magnitude hmit . Also shown are the true bright and faint apparent magnitude limits 

and JTifj^^, within which the rectangular regions 5i and 52 once again contain a joint distribution of M and 
Z that is separable. 



F{M) 



where F{M) is the Cumulative Luminosity Function (CLF) defined as 

I'M 

F{M) = / f[x)dx. 

J —00 



M 

(161) 



It follows immediately from its definition that has the property of being uniformly dis- 
tributed between and 1 . The random variable C, can be estimated directly from the data, 
without prior knowledge of the functional form of the CLF and is independent of the spa- 
tial distribution of galaxies. Therefore, the estimator C,i can be combined for each observed 
galaxy into a single statistic, Tc, which can then be used to test the magnitude completeness 
of a given sample for adopted trial magnitude limits m^. Tc is defined as 

..^|(c.-i)/(|v.)", 

where Vi is the variance. If the sample is complete in apparent magnitude, for a given 
trial magnitude limit, then Tc should be normally distributed with mean zero and vari- 
ance unity. If, on the other hand, the trial faint magnitude limit is fainter than the true 
limit, Tc will become systematically negative, due to the systematic departure of the Q 
distribution from uniformity on the interval [0, 1]. The random variable was applied by 
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Fig. 19 Schematic construction of the rectangular regions Ss and 54, defined for a typical galaxy at 
(Mi, Zi), which feature in the estimation of the variant completeness test statistic, Ty. The left-hand panel 
illustrates how 53 and S4 are constructed for a survey with only a faint magnitude limit ™fi,-,-j, and are shown 
for a trial faint limit . The right-hand panel shows the case where the survey also has a true bright limit 
"^Um (*hich we assume for simplicity is known), and the rectangles are constructed for trial bright and faint 
limits and respectively. Note that the construction of the rectangles is unique for a 'slice' of 

fixed width, 5M, in absolute magnitude. 



Rauzy, Hendry, & D'Mellow (2001) in conjunction with the Kolmogrov-Smirnov test and 
the correlation coefficient p, to devise a method to fit luminosity function models to observ- 
ables. This method was applied to the Southern Sky Redshift Survey (SSRS2) sample of 
da Costa et al. (1988). 

The Rauzy completeness test has since been applied to a wide variety of data exploring 
e.g. the completeness limits of the HI mass function in the HffASS survey (Zwaan et al. 
2004), the HI flux completeness of ALFALFA survey data (Toribio et al. 2011), and for 
nearby dwarf galaxies in the local volume (Lee et al. 2009). A recent study by Devereux et al. 
(2009) adopted the Rauzy method to determine the completeness of multi-wavelength se- 
lected data. 

In Johnston, Teodoro, & Hendry (2007) they extended its usefulness to survey data that 
are characterised either by two distinct faint and bright flux limits and/or the case where 
the presence of a bright limit is harder to detect. This is illustrated in the right-hand panel 
of Figure 18. In addition to this they constructed a new statistic called Tv which was con- 
structed instead from the cumulative distribution function of the corrected distance modulus 
for observable galaxies and denoted by H{Z), such that 



H{Z) = / h[Z')dZ'. 

J —00 



(163) 
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Fig. 20 The Johnston, Teodoro, & Hendry (2007) Tc and Ty completeness test statistics applied to a SDSS 
Early Type galaxy sample consisting of 30,000 objects. Right panel: shows the distribution of sources in 
M and Z which is well defined by the respective faint and bright apparent magnitude limits m5?^_^=14.5 
and m[jj^^=17.45 mag. Left panel: beginning the test at the bright magnitude limit, Tc and are estimated 
for each trial apparent magnitude limit, m», moving toward and beyond the faint limit of the survey in 
increments of 0. 1 . The j/-axis indicates the completeness Hmits, where, for a complete sample, it is expected 
to fluctuate about zero. The true limit of the survey will be indicated by a systematic drop in Tc or below 
pre-determined limits, in this case — 3o". The vertical dashed lines indicate the respective limiting magnitudes 
of the survey sample, 'TiJ?^ and "ifj^ . 



Thus, a new random variable, r, was defined that mirrored the C, variable and was constructed 
for the two cases, 

^=Hi^r^. ''''' 

as illustrated, respectively, on the left- and right-hand panels in Figure 19. As with the Tc 
statistic, one can combine the estimator f j for each observed galaxy into a single statistic, 
Tv, which we can use to test the magnitude completeness of our sample for an adopted trial 
magnitude limit mt . Tc is defined as 




Tv = y An-T, \ y V, . (166) 



Tv has the exact properties as Tc such that if the sample is complete in apparent mag- 
nitude, for a given trial magnitude limit ?n*, Tv should be normally distributed with mean 
zero and variance unity. 

Figure 20 illustrates how the test works in practice when applied to real survey data. The 
sample under test comprises of early-type galaxies selected from the SDSS as described in 
Bernardi et al. (2003b, 2005). To summarise, the galaxies have been selected within the 
magnitude range 14.5 < rrir < 17.75 within a redshift range 0.05 < z < 0.3. The resulting 
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distribution in absolute magnitude M and distance modulus Z is shown on the right-hand 
panel of Figure 20. It should be noted that the magnitudes have been K-corrected. The left- 
hand panel shows the resulting completeness test for Tc (black solid line) and Tv (red solid 
line). Since this survey sample is well defined by a faint and bright apparent magnitude 
limit, the Johnston, Teodoro, & Hendry (2007) estimators are applied, adopting a respective 
fixed 5Z and 5M width of 0.2 for Tc and Ty. For both estimators the test begins with a 
trial magnitude limit equal to the bright limit of the sample m* = 14.5. Tc and Tv are 
then estimated for increasing values of m* in increments of 0. 1 . Note that the test does not 
register a result for either estimator until there are enough objects to be counted for the C, and 
T random variables. In this example this corresponds to an m» = 14.8. As the figure shows, 
both Tc and Tv fluctuate about zero within approximately the j l|cr limits. Both test statistics 
then systematically drop sharply below — 3ct at the actual magnitude limit of the sample. For 
this sample under test the completeness test indicates that there are no residual systematics 
between 14.5 < rrir < 17.75 and that the data are therefore complete in apparent magnitude 
up to the published survey limit. 

In Teodoro, Johnston, & Hendry (2010) extensions to the method were made to gen- 
eralise the approach. This extension based the completeness calculation on a minimum (or 
constant) signal-to-noise level computed directly from the ( (and/or r) defined regions. With 
the introduction of a secondary bright limit (Figure 18-right) one is now restricted with the 
volume subsample size for each galaxy. In studying the distribution of galaxies in the (M, Z) 
plane one is trying to understand the underlying luminosity function of a given population of 
galaxies as well as the way that such a function is sampled. To do so one uses a finite number 
of galaxies to estimate whatever measurements. This makes them prone to shot noise and 
also to the existence of some spurious features of the global properties of the data-set. 



11 Applications 

This review has focussed primarily on the statistical methodology surrounding the luminos- 
ity function in the context galaxy populations. This section now focuses on selected areas 
within astronomy where the LF has had a strong impact. 

It should firstly be noted that the LF has not been confined just to the study of galaxy 
populations. The stellar LF has had an equally long history (see e.g. Salpeter 1955; Sandage 
1957; Schwarzschild & Harm 1958; Limber 1960; Simoda & Kimura 1968; Miller & Scalo 
1979, for early studies). In particular, the stellar LF has proven useful when constraining 
stellar evolution rates (e.g. Sandquist et al. 1996; Zoccali & Piotto 2000; Hargis et al. 2004), 
age and distance determination to globular clusters as well as constraining stellar evolu- 
tion processes (see e.g. Bahcall & Yahil 1972; Renzini & Fusi Pecci 1988). More recently 
the Century Survey Galactic Halo project utilised the spectroscopic redshift surveys, SDSS 
(York et al. 2000) and 2MASS (Cutri et al. 2003), to survey blue horizontal branch (BHB) 
stars within the Galactic halo (Brown et al. 2003). In Brown et al. (2005) they applied the 
EEP88 step-wise LF estimator to construct the BHB LF and compare with globular cluster 
data. Other studies by e.g. Covey et al. (2008) explores the LF and mass functions of low 
mass stars extracting data from the same surveys. The Hubble Space Telescope has also been 
pivotal for the study of young star clusters within the Milky Way, and their corresponding 
LF studies have proved useful in probing the underlying mass function (e.g. Zhang & Fall 
1999; Gieles et al. 2006; Devine et al. 2008; Portegies Zwart et al. 2010). The white dwarf 
luminosity function (WDLF) is another galactic source which provides useful constraints 
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on star-formation rate and history of the Galactic disk (e.g. Liebert et al. 1988; Wood 1992; 
Harris et al. 2006). 

In terms of extragalactic sources, there are extensive LF studies covering a broad range 
wavelengths which include: UV selected Lyman break galaxies (LBGs) to probe the very 
high redshift Universe out to z ~ 7 (e.g. Steidel et al. 1996, 1999; Madau et al. 1996; 
Sawicki & Thompson 2006; Bouwens et al. 2008b; Stanway et al. 2008); HI selected galax- 
ies (e.g. Zwaan et al. 2001); and the radio LF (e.g. Willott et al. 2001; Auriemma et al. 
1977; Kaiser & Best 2007); mid- and far-infrared (Hacking et al. 1987; Saunders et al. 1990; 
Pozzi et al. 2004; Caputi et al. 2007; Rodighiero et al. 2010); sub-mm (e.g Dunne et al. 2000; 
Vaccari et al. 2010); X-ray (e.g. Ueda et al. 2003; Hasinger et al. 2005; Brusa et al. 2009; 
Aird et al. 2010); and gamma ray bursters (GRBs) (e.g. Schmidt 2001 ; Wanderman & Piran 
2010). 

The remainder of this section discusses in more detail some of the recent highlights re- 
sulting from optical LF studies at low and intermediate redshifts as well the LF contribution 
to probing the evolutionary processes of QSOs. 



11.1 The optical luminosity function 

The review by BST88 gave a very comprehensive overview of the impact of galaxy LF 
studies up to the end of the 1980s. This section now examines the work carried out by 
various groups from the 1990s to present day, and summaries the developments in probing 
the optical LF from low redshift ranges {z < 0.2) up to the more intermediate redshifts 
2 < 1.5. 

11.1.1 The local universe 

Prior to the emergence of large redshift surveys such as the Two Degree Field Galaxy Red- 
shift Survey (2dFGRS) and the Sloan Digital Sky Survey (SDSS), obtaining large complete 
samples of the local Universe was largely limited to the technological constraints of obtain- 
ing spectroscopic redshifts. Nevertheless, throughout the 1990s surveys such as Stromlo- 
APM Redshift Survey (S-APM) (Loveday et al. 1992), Las Campanas (LCRS) (Lin et al. 
1996) and the ESO Slice Project (ESP) (Zucca et al. 1997) made significant progress to 
constrain the LF to 0.05 < z < 0.2 of field galaxies. 

In 1992 the S-APM survey measured hj magnitudes of 1769 galaxies out to hj = 
17.15 mags at a redshift 2 ~ 1.0. They applied the MLEs of STY79 and EEP88 to esti- 
mate the LF and constrained Schechter parameters M* = —19.50 ± 0.13, a faint end slope 
of a = -0.97 ± 0.15 and a normalisation 0* = (1.4 ± 0.17) x 10"^ Mpc"^. LF results 
from the CfA survey were then published by Marzke et al. (1994b) using a sample of 9063 
galaxies out to mz < 15.5 and within a range —13 < Mz < —22. By applying the STY79 
and the EEP88 methods they determined an overall M* = -18.8±0.3, a = -1.0±0.2 and 
a normalisation of (j>* = (4.0 ±1.0) x 10~^ Mpc~'^ (see also Marzke et al. 1994a, for their 
extended analysis to morphological types). By 1996, the LCRS had imaged 23,690 objects 
from which 18,678 galaxies were selected out to a limiting r-band magnitude of r < 17.5 
and an average redshift of 2 = 0.1. It was one of the first to use multi-fibre spectroscopy, 
allowing between 50 and 1 12 redshifts to be measured simultaneously. They too adopted the 
same LF estimators to the data and measured LF parameters roughly consistent with S-APM 
survey. Table 4 shows a summary of the results to compare to other redshift surveys. 
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Fig. 21 Reproduced from Norberg et al. (2002) by permission. The top panel shows the 2dFGRS LF results 
using the non-parametric SWML and the parametric MLE estimators and compared to the SDSS LF results 
from Blanton et al. (2001). The bottom panel shows comparisons with Zucca et al. (1997) and Loveday et al. 
(1992). 



One year later, the ESP survey published LF results that pushed approximately 2 mag- 
nitudes fainter, out to bj = 19.4 mag but with a smaller sample of 3342 galaxies out to a 
median redshift at z ~ 1.0. To estimate the LF they use the STY79 MLE and the modi- 
fied non-parametric Lynden-Bell C~ estimator called, 'C+', as described by Equation 81 
on page 32. Whilst the M* and 4>* values were consistent with the previous two surveys, 
the faint-end slope was considerably different, yielding a slope of a = —1.22. The discrep- 
ancy in the faint-end slopes was thought to be more an effect of incompleteness and sample 
variance toward the faint end in LCRS and S-APM. 

However, as multi-fibre technology improved surveys like the 2dFGRS (Colless 1998) 
marked a new era of mapping the local Universe. With a final catalogue of ~ 220,000 
measured spectroscopic redshifts out to a Zmax ~ 0.3 and a limiting magnitude of bj < 
19.45 mag, the 2dFGRS was, at this point, the largest redshift survey compiled. Initial LF 
analysis was performed by Folkes et al. (1999) using a preliminary sample of 5869 galaxies 
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in the range 0.01 < z < 0.2 to bj = 19.45 mag. They applied the standard Schmidt Vma,x 
and STY79 estimators and constrained overall values of M* = —19.7 and a = —1.3, which 
showed a much closer consistency with the ESP LF parameters. As the survey reached 
its conclusion, later studies by Madgwick et al. (2002) (examining the LF as a function 
of spectral type) and Norberg et al. (2002), using respective sample sizes of 75,000 and 
110,500 galaxies, would confirm these original results with global LF parameters of M* = 
-19.66, a = -1.21 and , 0* = (1.21 ± 0.03) x 10"^ Mpc"^ and a luminosity density of 
jbj = (1.82 ± 0.17) X lO^ftLoMpc"^ from Norberg et al. (2002). 

Finally, the advent of SDSS (York et al. 2000) brought, for the first time, a single survey 
imaging the sky in five bands {u ,g' / ,i' ,z' centred at 3540, 4770, 6230, 7630 and 9130 A 
respectively). It overtook the 2dFGRS in terms of the number of objects and remains, for 
the time being, the largest photometric and spectroscopic survey compiled to date. With 
the final catalogue containing ~ 1,000,000 spectroscopically confirmed galaxy redshifts, 
SDSS represents the most accurate map of the nearby Universe out to z < 0.3. Particular 
to this survey was the use of non-standard filters (see Fukugita et al. 1996, for more details) 
which required converting to make direct comparisons to existing surveys. The first LF 
analysis reported by SDSS was by Blanton et al. (2001) using the a sample 11,275 galaxies 
of the commissioning data. As well as computing LFs all the SDSS bands, they were able 
to use the five-band colour information to convert their magnitudes into previous survey 
bands to make an accurate comparison of previous LF results. They converted to LCRS 
r-band and bj (to compare to 2dF). These results are also shown on Table 4. Generally, 
their measured LF parameters were closer to that of 2dF than LCRS. However, they noted 
their luminosity densities were 1.4 times higher to that of 2dF, as measured by Folkes et al. 
(1999) reporting jr = (2.6 ± 0.3) x lO^/tI/0Mpc~^, concluding that previous surveys had 
missed a considerable fraction of luminosity density in the Universe. Figure 21 shows LF 
comparisons of the S-APM, ESP and the SDSS Blanton et al. (2001) survey samples taken 
from the Norberg et al. (2002) paper. 

Around two years later, Blanton et al. (2003b) compiled a sampled from the second data 
release SDSS-DR2 consisting of 147,986 galaxies to measure the LF at z=0.1 in all SDSS 
bands. The main difference in their analysis with the previous paper was the incorporation 
of evolutionary model into the LF. This resulted in an r*-band Schechter function fit with 
M* = — 20.44 ± 0.01 and a = —1.05 ± 0.01, and a newly computed luminosity density of 
jr = (1.84 ± 0.04) X lO^/iLoMpc^^ which was now consistent with the 2dF. 

The Millennium Galaxy Catalogue (MGC) (Liske et al. 2003) used B-band imaging 
on the Wide Field Camera on the 2.5m Issac Newton Telescope to produce higher quality 
photometry compared to 2dF and SDSS. The survey was designed to be fully contained 
within the 2dF and SDSS-DRl fields and imaged 10,095 galaxies out to a limiting magnitude 
Bmgc < 20.0 mag within a redshift range 0.013 < z < 0.18. In Driver etal. (2005) 
they applied their bivariate Step- Wise LF estimator (as detailed in § 6.4) and constrained 
Schechter LF parameters A/*=19.60 ± 0.04, a = —1.13 ± 0.02 and a normalisation of 
0* = 1.77 ± 0.15. Whilst their Ah value was in line with 2dF, the recovered faint-end slope 
was flatter. Their resulting luminosity density of jf, , = (1.99 ± 0.17) x lO*/iL0Mpc~'^ was 
similar to that of 2dF. 

A more recent analysis by Montero-Dorta & Prada (2009) has included 516,891 galaxies 
in the r*-band from SDSS-DR6, achieving a much greater redshift completeness than the 
previous analyses. Their results made accurate constraints on both the bright and faint ends 
of the LF with r-*-band Schechter fits of M* = -20.73 ± 0.04 and a = -1.23 ± 0.02. 
They reported notable differences in the bright-end of the best-fit LF of an excess at the 
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Mu ^ —20.5. The excess then weakens moving in the g-band and fades away toward the 
2-band. It was concluded that this excess may be the result of QSOs. 

11.1.2 Probing evolution at intermediate redshifts 

Our recent ability to probe toward higher redshifts (z< 1.0) with significant enough number 
statistics, has opened a new window to the evolutionary processes of different galaxy popu- 
lations through the study of their LFs. However, it should be noted that making comparisons 
between various surveys has been a tricky process largely due to the ways in which galaxy 
types have been defined e.g. by spectral type or by morphology (see e.g. Ilbert et al. 2006, 
for a discussion). Nevertheless, over the last 15-20 years, an overall consistent picture of 
galaxy evolution at this redshift range seems to be reaching convergence (for reviews see 
Koo & Kron 1992; EUis 1997; Faber et al. 2007). 

Through the mid to late 1990s work in this area was largely led by groups such as 
the Canada-France Redshift Survey (CFRS) (Lilly et al. 1995), Autofib I & II (Ellis et al. 
1996; Heyl et al. 1997), the Hawaii Deep Fields (Cowie et al. 1996), the Canadian Network 
for Observational Cosmology survey (CNOCl & 2) (Lin et al. 1997, 1999) and the Norris 
Survey of the Corona Borealis Supercluster (Small et al. 1997; Lin et al. 1999). Whilst these 
survey samples typically ranged from hundreds to just under 2000 galaxies, a clear picture 
of evolution between 'red' early-type galaxies and 'blue' star-forming late-type galaxies 
had emerged in which star-forming galaxies were observed to have more rapid and stronger 
evolutionary properties compared to early-type galaxies. Table 5 provides a summary these 
and other surveys related to this study. 

The CFRS, using a sample of 730 /-band selected galaxies, published results that seemed 
to lend support to the monolithic collapse scenario (e.g. Eggen et al. 1962; Searle & Zinn 
1978) in which galaxies originate from large regions of primordial baryonic gas which then 
collapses to form stars within the central region of a dark matter halo, thus allowing the most 
massive galaxies to form first. They found that the LF of red galaxies showed no evidence for 
either number density or luminosity evolution over the range < 2 < 1.0. However, their 
results for the blue population of galaxies indicated evolution in the form of a steepening in 
the faint-end slope of the LF. 

The Autofib Redshift Survey explored both the global evolution properties of the B-band 
LF (Autofib I Ellis et al. 1996) and evolution by spectral type (Autofib II Heyl et al. 1997) 
out to 2 ~ 0.75 and covering a range in magnitude 11.5 < &,/ < 24.0. It was also this paper 
that saw extensions made to both the SWML method of EEP88 and the MLE of STY79 for 
the combining of multiple samples (see e.g. § 5.5 on page 33). The galaxy population were 
divided into early-type E/SOs, early spirals (Sab/Sbc) and late spirals (Scd/Sdm/starburst). 
For the early-types, they found no real evidence for evolution out to z ~ 0.5. The early- 
type spirals showed some evidence for evolution with slight steepening of the faint end 
slope of the LF as a function of redshift. Overall there seemed no significant change in both 
luminosity and number density for the early-type spiral galaxies. However, the late-type 
spirals showed significant evolution over the same redshift range in both luminosity and 
increased number density. Moreover, using [On] emission as an indicator of star-formation, 
they also found a steep increase in such emission implying a rapid increase in star formation 
rate as a function of redshift. 

At around the same time, the CNOC redshift surveys were being carried out, which 
by the end of the 1990s utilised a sample of ~ 2000 galaxies to probe evolution of the 
LF between 0.12 < z < 0.55 within the magnitude range 17.0 < Rc < 21.5 (Lin et al. 
1997, 1999, respectively known as CNOCl and CN0C2). At this time it was the largest 
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intermediate-redshift galaxy survey. They classified galaxies into early-types, intermediate- 
types and late-types, estimating rest-frame B, Rc, and U band LFs. Applying both the 
STY79 and EEP88 estimators they found that when looking back in redshift they observed 
their early-type sample showing an increasing luminosity evolution but with a reduced num- 
ber density. Their intermediate-types showed no evidence for number evolution but indicated 
a slight increase in luminosity. The late-type sample showed strong evidence for number 
evolution with stronger star formation at higher redshifts. Overall, they reported distinct sep- 
aration of luminosity (early-types) and density evolution (late-types). It was noted that due 
to differences in both the way galaxies were classified and evolutionary models adopted, 
direct comparison of CN0C2 and Autofib was difficult. However, in general the CN0C2 
group found their results qualitatively consistent. 

The COMBO- 17 (Wolf et al. 2003) survey marked a significant increase in the number 
of galaxies surveyed from 0.2 < z < 1.2, totalling 25,431 measured photometric redshifts 
out to a limiting magnitude R < 24.0. They provided a thorough study of evolution as a 
function of LF and comoving luminosity density at different rest-wavelengths with compar- 
isons made to the CFRS, CN0C2 and CADIS (Fried et al. 2001) surveys. The galaxies were 
split into four types based on templates from Kinney et al. (1996) where Types 1,2,3 & 4 
were, respectively, categorised as E-Sa, Sa-Sbc, Sbc-SB6 & SB6-SB1. After accounting for 
the differences in galaxy-type classification, they found their results to be consistent with 
CN0C2. They reported evolution of the LF was strongly SED type dependent in both den- 
sity normalisation and characteristic luminosity. For their early-type sample (Type 1) they 
found a faint-end slope of a = 0.5 ± 0.2 which then steepened to q = 1.5 ± 0.06 for 
bluer galaxies (Type 4). For the latest-type galaxies they found no real change in the den- 
sity normalisation, 4>* over their redshift range, but instead, found a progressively faint A/*. 
Conversely, their sample of early-type galaxies they found an order magnitude increase in 
cj>* from 2 = 1.2 to present day. A/*, however, did become progressively fainter, albeit to a 
smaller degree compared to the latest-type galaxies. 

Bell et al. (2004) described a model independent method for segregating galaxies into 
early- and late-types by identifying the bimodal rest-frame color distributions of galaxies 
out to 2 ~ 1. This would prove an invaluable approach for future studies. 

The VIMOS-VLT Deep Survey (VVDS) (e.g. Ilbert et al. 2005) combined spectroscopic 
redshifts in the Chandra Deep Field South and photometric redshifts from COMB 0-1 7 and 
multi-colour images from the HST/ACS Great Observatories Origin Deep Survey (GOODS) 
to probe evolution out to z ~ 1.5. Analysis of this survey was performed in two different 
ways: in Zucca et al. (2006) they classified galaxies according to spectral type, in Ilbert et al. 
(2006) they instead look at galaxy morphology for classification. 

As with COMBO-17 Zucca et al. (2006) split their sample population of 7713 galax- 
ies into four spectral types, but based their classifications on different templates using CCW 
Coleman et al. (1980). In this way the four types were classified as: E/SO (Type 1), early spi- 
rals (Type 2), late spirals (Type 3) and irregulars (Type 4). For their LF studies they applied 
their ALE algorithm and compared their results with previous surveys such as CN0C2 and 
COMBO-17. The comparisons indicated general agreement with CN0C2 in terms of neg- 
ative density evolution for early-types and a strong positive density evolution for late-type 
galaxies. However, unlike CN0C2, they did not observe luminosity evolution for early- 
types. Their comparison to COMBO-17 did yield notable differences in the shapes and nor- 
malisation of the LFs. For example, the main discrepancy that was reported concerned the 
evolution of the Type 1 galaxies. In COMBO-17 they saw very strong density evolution 
which decreased with increasing redshift. Such evolution was not observed in the VVDS 
sample. It was thought this may be due to differences in galaxy classification. As a test 
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they consolidated their Type 1 and 2 populations and re-estimated their LFs. Whilst this 
seemed to reduce the discrepancy the differences in the slopes and normalisation of the LFs 
remained. 

In Ilbert et al. (2006) their classification scheme of the VVDS data was morphologi- 
cal, based on structural features of galaxies within which three main populations were es- 
tablished - disk-dominated populations, red bulge-dominated populations and blue bulge- 
dominated populations. As in Zucca et al. (2006) they applied their ALF algorithm to com- 
pute the LFs and presented results from the rest-frame B-band LF out to 2 = 1.2. In general 
they observed a strong dependency on the shape of the LF according to morphological type. 
At redshift range 0.4 < z < 0.8 the red bulge-dominated population LFs that indicated 
a shallow faint-end slope of a = 0.55 ± 0.21. This was was in stark contrast to the disk 
population which showed a much steeper slope of q = — 1.19 ± 0.07. Due to the irregular 
galaxies being included in the disk population sample, it was expected that strong evolution 
would be observed. Instead, the opposite was found. It was thought the effects such as cos- 
mic variance and the domination of spiral galaxies in this sample were possible reasons for 
this effect. With the red bulge-dominated population they found that in general there was a 
distinct number density evolution with the age of the Universe consistent with the hierarchi- 
cal scenario, and indicating that E/SO galaxies are already established at 2: ~ 1. Finally the 
blue bulge-dominated population was found to have strong evolution with a brightening of 
~ 0.7 mag between 0.6 < 2 < 1.0. 

At around the same time as VDDS the Deep Extragalactic Evolutionary Probe 2 (DEEP2) 
(e.g. Willmer et al. 2006) survey team reported their B-band LF results of ~ 11, 000 galax- 
ies out to 2 ~ 1. As well as examining the global LF they also provide analysis for sub- 
divided blue and red populations (using colour bimodality). As with previous studies their 
findings showed red and blue galaxies evolving differently. In general, the blue galaxies 
show strong luminosity evolution but little number evolution, whereas the red galaxies show 
strong number evolution with little change in luminosity over the redshift range. Their re- 
sults showed similar trends to previous surveys such as COMBO-17. 

Faber et al. (2007) combined data from DEEP2 and COMBO-17 to produce a catalogue 
of 39,000 galaxies which helped reduce effects from cosmic variance and Poisson noise 
by 7% and 15% per redshift bin. Using the colour bimodality approach as first employed 
by Bell et al. (2004), galaxies were classified into blue and red sub samples. They found 
that for a fixed number density moving toward higher redshifts, the blue population showed 
a brightening in magnitude. The red galaxies, however, showed almost no evolution for a 
fixed absolute magnitude. Their Schechter LF fits showed good agreement with the original 
analysis in both DEEP2 and COMBO-17 at all redshifts. When combining their distant 
Schechter LF parameters with local estimates they concluded that the number density of 
blue galaxies has remained the same since 2=1, whilst the red galaxies have shown an 
increase in number density by a factor of ~ 2 since 2 ~ 1. This result is restricted to 
galaxies near L*g and does not extend to more luminous galaxies. 

Finally, the more recent and ongoing COSMOS survey (Scoville et al. 2007) and the red- 
shift follow-up zCOSMOS (Lilly 2007; Zucca et al. 2009) have published LF studies have 
using the 10k bright sample which, so far, consists of 10, 644 objects (Zucca et al. 2009). 
This paper estimates both the global LF (gLF) in the range 0.01 < 2 < 1.3 and the LF as 
function of both spectrophotometric and morphological type. In terms of spectrophotometric 
typing, galaxies were divided into similar categories as in Zucca et al. (2006) with Type 1 = 
E/SO, Type 2 = early spirals. Type 3 = late spirals and Type 4 = irregular and starburst galax- 
ies. As in previous studies, the ALF algorithm was applied to estimate all LFs. To allow for 
better constraints on luminosity evolution, the parameter a was fixed to the value obtained 
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in the redshift range 0.30 < z < 0.80. Overall, they found that evolutionary properties in 
both luminosity and number density was consistent with previous VVDS studies. To better 
constrain evolutionary trends their Type 3 and 4 populations were combined. With this the 
Type 1 galaxies showed both luminosity and number evolution with a brightening of M* 
by ~ 0.6 mag and a decrease in 0* from the lowest redshift bin [0.1 to 0.35] to the highest 
[0.75 to 1.00]. Whilst the Type 2 population showed an overall brightening of ~ 0.25 mag, 
there was no significant number evolution observed. Finally, the Type 3+4 galaxies showed 
evolution in both number and luminosity with a brightening of ~ 0.5 mag and an increase 
of 0* with redshift. 

Their morphological studies saw galaxies classified into three groups: early-types, spi- 
rals and irregulars, and were analysed over the same redshift range as the spectrophotometric 
types. The summary of these results showed that early-type galaxies dominate the bright- 
end of the LF at low redshifts (z < 0.35), whilst spirals tended to dominate at the faint-end. 
Their sample showed that at the redshift range 0.35 < z < 0.75 a positive luminosity evo- 
lution was observed in both the spiral and early-type populations. At the highest redshift 
range, z > 0.75, the irregulars were observed to show strong increase in number evolution 
toward high redshifts. 



11.2 TheQSOLF 

Quasi-stellar objects (QSO) were first identified in 1963 by Maarten Schmidt and early 
pioneering work by e.g. Schmidt (1963, 1968, 1972); Schmidt & Green (1983) confirmed 
that they had strong evolutionary properties with a steep increase in space densities with 
increasing redshift. Central to understanding their evolutionary process is estimating the 
quasar luminosity function (QLF) as a function of redshift. 

Research in this field has garnered renewed interest in recent times due, in part, to sur- 
veys such as the 2dF and the SDSS which increased the number of QSO detections by 
orders of magnitude, vastly improving on number statistics, providing more comprehensive 
catalogues to fainter limiting magnitudes. Furthermore, recent observational evidence by 
Kormendy & Richstone (1995); Magorrian et al. (1998); Ferrarese & Merritt (2000) has es- 
tablished a relationship between the evolution of galaxies with their central supermassive 
black holes. Thus, the QSO luminosity function (QLF) can potentially provide strong con- 
straints on SMBH formation and evolution (Haiman & Loeb 2001; Yu & Tremaine 2002), 
constraints on structure formation models coupled with the environmental processes such 
as AGN feedback (e.g. Small & Blandford 1992; Haehnelt & Rees 1993; Efstathiou & Rees 
1988; Kauffmann & Haehnelt 2000). 

Work by Boyle et al. (1988a,b) originally showed that at low redshifts the QLF can be 
well described by a broken double power-law of the form 

^{M,z) = -|^qo.4[A/-A/*](q-|-1) ^ IQQA[M-M-']{I3 + 1) ^^^'^^ 

with a break at M* and a bright-end slope a steeper than the faint-end slope /3. 

The work performed during the 1980s and early 1990s using parametric fits to data had 
built a picture for the QSO LF at redshifts 2 < 2.2 which was described by a steep bright 
end slope {a ~ —3.6) and a very flat faint-end slope (fi ~ —1.2) and showed strong evolu- 
tionary properties that could be described by fitting pure luminosity evolution models (PLE 
; see § 3.2) with a peak in space density at z ~ 2 - 3 (e.g. Osmer 1982; Marshall et al. 1983; 
Marshall 1985; Koo & Kron 1988; Boyle et al. 1990; Hartwick & Schade 1990; Warren et al. 
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1994). The understanding of how the QLF evolves with redshift was put on a more solid 
footing with the advent of surveys that brought the number of detections into the tens of 
thousands allowing accurate determination of the QLF at z < 2.5. 

The AGN LP has also been studied extensively in the X-ray. Whilst early measurements 
of the X-ray AGN LF by Boyle et al. (1993) showed evidence that evolution was following 
the standard PLE model, more recent results have also reported the AGN downsizing. In 
this scenario lower luminosity AGN peak in their comoving space density at lower redshifts 
(z < 1), whilst higher luminosity AGN peak at higher redshifts 2 ~ 2 (e.g. Ueda et al. 2003; 
Barger et al. 2005; Hasinger et al. 2005; Babic et al. 2007; Brusa et al. 2009) 

The advent of the The 2-degree Field QSO redshift survey (2QZ) (Boyle et al. 2000; 
Groom et al. 2001, 2004) provided a substantial increase in the number of QSOs catalogu- 
ing a total of 23,338 within the magnitude range 18.25 < bj < 20.85 out to 0.4 < 2 < 2.1. 
However, it was not until the 2dF-SDSS LRG and QSO (2SLAQ) survey (Richards et al. 
2005; Groom et al. 2009a) that evidence of downsizing was also observed. The 2SLAQ sur- 
vey combined photometry from SDSS-DRl (e.g. Gunn et al. 1998; Stoughton et al. 2002; 
Abazajian et al. 2003) to produce a catalogue of 10,637 QSOs. 2SLAQ provided a sam- 
ple 1 magnitude deeper than 2QZ allowing to probe the faint end of the QLF out to bj < 
21.85 mag and to a redshift of 2 < 2.6. By applying a similar approach to construct the QLF 
as in Groom et al. (2004), they provided constraints on both the faint and bright end of the 
QLF, finding evidence for number evolution in the form of downsizing which had not been 
observed in earlier studies by e.g. Richards et al. (2005) (using early results from 2SLAQ) 
and Wolf et al. (2003) (using the COMBO-17 survey). Using this catalogue. Groom et al. 
(2009b) showed how the activity in low-luminosity active galactic nuclei (AGNs) peaks at a 
lower redshift than that of more luminous AGNs. 

Accurately constraining the QLF at 2 > 3 is much more of a challenge where low 
numbers of low-luminosity objects at high redshift have meant faint end of the QLF is 
much more poorly understood. As such, whether or not downsizing is observed at earlier 
epochs remains a pertinent question. However, there have been made a lot of efforts to 
build larger catalogues (e.g. Fan et al. 2000, 2001, 2003, 2004; Richards et al. 2006; Goto 
2006; Siana et al. 2008; Glikman et al. 2010; Willott et al. 2010). There have been a number 
of studies of the faint-end slope at 2 ~ 3 (e.g. Hunt et al. 2004; Bongiomo et al. 2007; 
Fontanot et al. 2007; Siana et al. 2008). However, recent work by Ikeda et al. (2011) and 
initial studies by Glikman et al. (2010) have both provided constraints on the QLF at 2 ~ 4 
that, at first, yielded conflicting results. 

Glikman et al. (2010) using both Deep Lens Survey Wittman et al. (2002) and the NOAO 
Deep Wide-Field Survey (Jannuzi & Dey 1999) found 23 QSOs candidates between 3.74 < 
2 < 5.06 down to a limiting _R < 23 magnitude. To construct the QLF they applied the stan- 
dard Vmax estimator and also use the STY79 MLE to constrain the LF parameters of Equa- 
tion 167. They found that when combining their QLF data with that of the Richards et al. 
(2006) SDSS data, the QLF at 2 ~ 4 was best described by a single power-law with 
a /3 = -2.3 ± 0.2. However, Ikeda et al. (2011), using the GOSMOS survey, identified 
eight low-luminosity QSOs at this redshift within a magnitude range 22 < i' < 24 us- 
ing the FOCAS instrument on the Subaru Telescope. By applying a least-squares weighted 
fit to the QLF of Equation 167, their results yielded a much shallower faint-end slope of 
/? = — 1.671q ]^7, which they found to be more consistent with the downsizing scenario 
of AGN. Ikeda et al. suggested potential contamination of the Glikman et al. sample be- 
ing the cause for the large discrepancy. However, follow-up work by Glikman et al. (2011) 
examined an increased sample of a further 40 candidates. Whilst they have now found a 
P = — 1.6^Q g being in agreement with Ikeda, their normalisation is higher by a factor of 4. 
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It will be interesting to see how these results will converge as the number statistics improve 
in the future. 

Work continues exploring the most distant QLFs out to z ~ 6 using i-band dropout 
techniques (see e.g. Fan et al. 2003, 2004; Richards et al. 2004; Fan et al. 2006; Jiang et al. 
2009; Willott et al. 2010). The SDSS and the Canada-France High-z Quasar Survey (CFHQS) 
(e.g. Willott et al. 2009) have increased the number of high redshift quasars at z ~ 6 to 
~ 60. Work by Fan et al. (2003, 2004) and Richards et al. (2004) indicated a bright-end 
slope reaching a value close to /3 ~ —4. Using SDSS samples, Jiang et al. (2009) found a 
significantly shallower slope at /3 = — 2.6±0.3. It was suggested by Willott et al. (2010) that 
there may be a break in the QLF at A/1450 ~ —26 which could account for this discrepancy; 
a feature that they also found in their analysis using data from the Canada-France High-z 
Quasar Survey (CFHQS) combined with SDSS that identify 40 QSOs candidates at redshift 
range 5.74 < z < 6.42. In their analysis, they found a bright-end slope of /3 = —2.81. 

At the more intermediate redshift ranges, the QSO LF seems very well constrained and 
significant progress has already been made in probing the high redshift QSO LF. However, it 
is clear that, in general, many more QSOs identifications are required to effectively constrain 
QLFs at earlier epochs if they are to be used in constraining AGN and SMBH formation 
evolutionary models. 



Table 4. Summary of LF parameters from selected redshift surveys at low redshifts. The 
columns are as follows: (1) the paper from which the LF parameters were computed, (2) 
the limiting apparent magnitude for the survey, (3) number of galaxies in the sample, (4) 
the redshift the LF was computed at, (5) (6) & (7) are the LF parameters, and (8) lists the 

LF estimators used: STY79 is the parametric maximum likelihood estimator, EEP88 is the 
Step Wise MLE and C+ is the modified estimator. 
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Table 5. Summary of galaxy LF evolution studies at intermediate redshifts. 



Survey Data 






Redshift 


LF Method 













LF studies out to z < 1.5 



CFRS (Lilly et al. 1995) 


17.5 < Iab < 22.50 


730 


< 2 < 1.0 


STY79, EEP88 


Autofib I (Ellis et al. 1996) 


11.5 <bj < 24.0 


1700 


<z< 0.75 


l/Knax,EEP88 


Hawaii Deep Fields (Cowie et al. 1996) 


K = 20.0,/ = 23, B = 24.5 


393 


0.2 < 2 < 1.7 


1/ Vniax 


CNOCl (Lin et al. 1997) 


variable 


389 


0.2 < 2 < 0.6 


STY79. EEP88 


Autofib II (Heyletal. 1997) 


11.5 <bj < 24.0 


1700 


<z < 0.75 


STY79, EEP88 


NORRIS (Small et al. 1997) 


14 < r < 20.0 


493 


< 2 < 0.5 


EEP88 


HST imaging of CFRS & Autofib 




341 


z $ 1-0 


1/ Vmax 


(Brinchmann et al. 1998) 










CNOC2 (Lin et al. 1999) 


17.0 <Rc < 21.5 


~ 2000 


0.12 < z < 0.55 


STY79. EEP88 


CADIS (Fried et al. 2001) 


/815 = 30.0 


2779 


0.1 < 2 < 1.0 


1/Knax, STY79 


CFGRS (Cohen 2002) 


R < 24.0 


553 


0.01 < 2 < 1.5 


STY79 


DEEP Groth Strip Survey (Im et al. 2002) 


16.5 < I < 22.0 


145 


z < 1-0 


1/Knax, STY79 


COMBO-17 (Wolf et al. 2003) 


R < 24.0 


~ 25, 000 


0.2 < 2 < 1.2 


l/14nax, STY79 


ESO-S (de Lapparent et al. 2003) 


Rc < 20.5 


617 


O.K 2 < 0.5 


STY79, EEP88 


VIMOS VLT Deep (Ubert et al. 2006) 


Jab = 24.0 


~ 4160 


0.4 < 2 < 1.0 


1/V^max, C+, STY79, EEP88 


VIMOS VLT Deep (Zucca et al. 2006) 


Jab = 24.0 


7713 


0-05 < 2 < 1.5 


1/I4nax, C+, STY79. EEP88 


DEEP2 (Willmer et al. 2006) 


Rab ~ 25.5 


~ 10, 000 


z < 1-0 


1/V^max (Eales 1993), STY79 


zCOSMOS (Zucca et al. 2009) 


15 < / < 22.5 


8478 


z < 1.0 


l/Vmax, C+, STY79, EEP88 



82 



Russell Johnston 



12 Concluding Remarks 

This review is an attempt to consolidate all the most innovative statistics developed for 
estimating the LF from their early beginnings 75 years ago to present day. Figures 22 and 
23 show a time-line diagram which charts the genealogical progress of all these estimators 
(a larger high-resolution version is available on request). 

Within the non-parametric regime it was discussed that whilst the traditional number 
count classical approach is straightforward in its construction, it is limited by its assumption 
of spatial homogeneity - a limitation also shared with the 1/Knax estimator. This intrinsic 
bias of Knax has been demonstrated by W97 and TYIOO when direct comparisons to other 
LF estimators showed overestimation of the LF particularly at the faint-end slope. 

In terms of the predictive power of the V/Vmax counterpart I would add a cautionary 
note. Probing evolution in data can pose degenerate qualities making it difficult to deter- 
mine whether a significant departure from the expectation value of V/Knax (~ 1/2) is due 
to inhomogeneity effects (introduced by clustering) or evolution, or simply an indication of 
underlying incompleteness of the catalogue from other sources of contamination. Neverthe- 
less, this seems not to have deterred its steady growth in popularity within the astronomical 
community as is evident from its numerous extensions to multiple surveys with varying flux 
limits, diameter-limited surveys, fitting generic LFs and adaptation to photometric redshifts 
etc... In fact, l/Vmax and V/Vma.^ remain one of the most widely applied non-parametric 
estimators for analysing the statistical properties of extragalactic sources. This is most likely 
due to the ease with which it can be applied. 

The construction of the (t>/(!> estimator and, in particular, the method offered a way 
to effectively circumvent the assumption of homogeneity by the cancellation of the density 
terms within their construction. The C~ method could be considered as a breakthrough since 
the method required sampling the CLF and therefore no binning of the data was required 
(see Andreon et al. 2005, for an interesting discussion on the pitfalls of binning). And yet, 
despite the fact that it was pioneered just three years after Schmidt's estimator and eight 
years before (t>/$, it did not grow in popularity as other tests did. Petrosian (1992) also 
demonstrated that all non-parametric methods are essentially variations of the C~ method 
under the limit of having one galaxy per bin. It is therefore a slight mystery as to why this 
approach has not been applied more often. Perhaps this is due to other maximum likelihood 
estimators (MLE) that developed as a result. 

Probably the most notable of the MLEs is the so-called STY79 estimator named after 
its developers, Sandage, Tammann and Yahil in 1979. Its main appeal is that one must adopt 
a analytical form for the LF and estimate its parameters via a maximum likelihood process. 
The form of the LF one chooses is rather arbitrary and in general the Schechter function 
(Schechter 1976) is favoured due to its robustness to various survey data. Approaching the 
problem this way avoids many of the problems associated with binning techniques and also 
avoids issues of density inhomogeneities. However, one needs to assess the goodness-of-fit 
and determine the normalisation via independent means. 

EEP88 extended the ideas of Lynden-Bell's method and the STY79 MLE by devel- 
oping the non-parametric case of the MLE by replacing the analytical form of the LF with a 
series of step functions. This step-wise maximum likelihood (SWML) method has become 
an equally favoured estimator in recent times. It, like Vmax has seen various extensions for 
e.g. bivariate distributions and combining multiple surveys to improve its usage. 

Also examined in detail were the more recent LF estimators that have emerged within 
the last few years that depart from the traditional approaches. The first one, developed by 
Schafer (2007), states key potential advantages for its construction that include making no 
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Strict assumption of separability between the probability densities p{z) and <P{M) and in- 
corporating a varying selection function into the method. The second by Kelly et al. (2008) 
takes a Bayesian approach adopting a mixture of Gaussian functions for the prior distribu- 
tion. Finally, the more recent Takeuchi (2010) bivariate construction of the LF applied the 
copula which connects two distribution functions. This parametric technique was applied to 
the FUV-FIR BLF 

The final part of the review shifted focus to one of the basic assumptions of most LF 
estimators - the assumption of separability between the luminosity function 4>{M) and the 
density function p{z). By building on Lynden Bell's C~ method, Efron & Petrosian (1992) 
constructed test statistic that can serve as a test of correlation between the assumed inde- 
pendent variables M and z and furthermore be used a to constrain pure luminosity models, 
where any such evolution would introduce correlations in the (M, z) distribution. It was then 
demonstrated how Rauzy (2001) built upon the ideas in EP92 and turned them into a sim- 
ple but powerful robust non-parametric test of magnitude completeness. This completeness 
test does not require any modelling of the LF and unlike V^/Vmax is independent of the 
spatial distribution of sources. Moreover, it returns a differential measure of completeness 
where one can assess the level of incompleteness over a range of magnitudes by applying 
successive trial apparent magnitude limits. 

What has probably become obvious through this review is that choosing the correct LF 
estimator for your data is not a straight forward task. However, recent statistical advances 
mean that there are a myriad of options available. This decision usually comes down to taste 
and experience, and perhaps more crucially, how far one is willing to extend their analysis 
beyond simple applications to obtain rigorous, if more computationally challenging, results. 
For now, the practice of applying several complementary LF estimators to a given data-set 
provides a reasonable, if time consuming, approach. 

In closing, the notion that we have entered into an era of precision cosmology has been 
cited for more than a decade (Turner 1998) and can perhaps be justified when applied to 
CMBr measurements (e.g. Spergel et al. 2003; Larson et al. 201 1). However, in the study of 
galaxy redshift surveys, the era of precision cosmology would appear to be approaching but 
will require improvement in both the quality and size of our data-sets and, crucially, in our 
statistical toolbox before we can claim that is has truly arrived. Whether the more recent 
methods will become as popular as the traditional ones remains to be seen, since shifts 
towards seemingly more complex techniques can take time to catch on. Nevertheless, it is 
encouraging that this branch of astronomy continues to develop and embrace new statistical 
methods. With with the next generation of galaxy redshift surveys on the horizon, it will be 
interesting to see how the current approaches to constraining the luminosity function will 
adapt and which new methods will come to prominence. 
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The Slatistical Development For Eslimaling the Luminosily Funclioi 
(Part 1) 
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Fig. 22 Part 1 of a schematic charting the development of all the major statistical methods that estimate the 
galaxy LF. A larger high resolution chart is available on request. Fig. 23 shows Part 2. 
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The Slalislical Developmenl For Eslimaling the Luminosily Funclioi 
(Part 2) 
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Fig. 23 Part 2 of a schematic charting the development of all the major statistical methods that estimate the 
galaxy LF. A larger high resolution chart is available on request. 
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